Showing results for tags 'karpenter'.

karpenter How Upstox built NextGen trading platform using Amazon EKS, Karpenter, and spot instances

Amazon Web Services posted a topic in Kubernetes & Container Orchestration

This is a guest post by Pranav Kapoor, Head of DevOps at Upstox co-authored with Jayesh Vartak, Solutions Architect at AWS and Jitendra Shihani, Technical Account Manager (TAM) at AWS. Upstox is India’s largest investech, a multi-unicorn valued at $3.5 billion. It allows you to buy and sell stocks, mutual funds, and derivatives, and is loved and trusted by over 12 million customers. It is backed by Mr. Ratan Tata and Tiger Global and is the official partner for the Tata IPL (Indian Premier League). Upstox experienced 10x growth during the pandemic, with the number of users increasing from 1 million to over 10 million in 2022. To sustain this exponential growth and prepare for future expansion, Upstox set high standards for running its trading platform. These standards include availability, scalability, security, operational efficiency, and cost optimization. Availability: Targeted availability SLA from 99.9% to at least 99.99%. Scalability: Aimed to reduce the scaling lag and provide superior performance even during sudden bursts of traffic, such as market opening hours, budget day announcements, or market news. Security: Planned to implement more guardrails to improve security posture. Additionally focused on data privacy, handling of Personally Identifiable Information (PII) data, and storing and processing customer data in a highly secured way. Also aimed to streamline the auditing and compliance processes. Operational efficiency: Aimed to create new infrastructure or environment in an automated way using Infrastructure-as-a-Code (IaaC). Also, wanted to incorporate chaos engineering as part of release lifecycle. Cost Optimization: Aiming to reduce the infrastructure and operational cost without compromising availability, security, and scalability. In order to meet these targets, Upstox embarked on the journey of building a NextGen platform called “Greenfield”. Upstox chose Amazon Elastic Kubernetes Service (Amazon EKS) as a core compute platform for Greenfield to leverage the benefits of containers, whereas the earlier platform was on Amazon Elastic Compute Cloud (Amazon EC2). In this post, we share the tenets followed to build the Greenfield, its core differentiators, and the outcomes. Upstox Greenfield philosophy In order to build the future-proof architecture that can handle further exponential growth in traffic and incorporating flexibility for evolving over time, we used the following tenets: Security: Security is “job zero” at AWS and Upstox. It is the top priority in every aspect of the platform, following the least access privileges, mandatory encryption for data at-rest and in-transit, no SSH keys sharing, mandatory AWS Identity and Access Management (IAM) role based access (no IAM access key and secret access key sharing). Customer Experience: The architecture is focused on availability, performance, scalability, and resiliency to deliver the best customer experience. We set the simple principle that any server belonging to any service can be terminated at any point-in-time, and it should not have an impact on the customer experience. Smart Defaults: Simplicity is key to the long-term success of the platform. Today Kubernetes offers a wide range of capabilities, but if we use all these capabilities then we can quickly make the platform complex. Therefore, we decided to keep the platform simple yet powerful by selecting only a few of the Kubernetes capabilities. Additionally, to keep the platform simple, we followed the principle of building services with smart defaults. The idea is that all the services come with default settings so that novice users or new users can easily run their applications without getting into the complexities of the platform. At the same time, it allows advanced users to customize the defaults as per their respective use cases. For example, the default for the minimum number of pods is two, and the zone topology constraints spread the pods across multiple Availability Zones (AZs) so that by default the application is highly available. To avoid single point of failure, the application teams can’t override the minimum number of pods to less than two and can only override to more than two. Keeping the platform simple along with smart defaults has enabled all teams, such as development, QA, InfoSec, and Operations, to focus on understanding and leveraging a few yet powerful capabilities and be proficient in them. NoOps: The goal is to eliminate or minimize the manual activities as much as possible. This involves having continuous integration/continuous deployment (CI/CD) pipeline for not only applications but also infrastructure (such as IaaC). Also, the test first approach along with chaos engineering is vital to achieve NoOps. The objective is to empower the team by providing self-service, automated platform, and keeping the human activities to a minimum. For example, if a team wants to deploy the application and InfoSec needs to approve it, then they should keep the human touch-point to only the InfoSec team and eliminate the touch-point with the DevSecOps team. To keep the focus on automating everything, the DevSecOps team is comprised of mostly developers, so that they are spending most of the time automating the platform as opposed to day-to-day operations. Cost optimized: We wanted to build the cost-optimized platform. Along with the core cost optimization levers, such as right sizing, auto-scaling, AMD powered instances, AWS Graviton, spot instances, savings plan, and reserved instances, the platform also focuses on leveraging services features (such as S3 intelligent tier) and re-architecture of the applications for cost optimization. Approach It took about a year to build the Next-Gen trading platform. In the planning phase, we evaluated the applications portfolio for various dimensions, such as criticality of the application, dependencies of the application, complexity of the application, containerization effort, testing effort, current cost, and roadmap. Based on this evaluation, we categorized the applications into five buckets: Two representative applications for identified capabilities Two mission-critical applications Top cost contributing applications Remaining to-be containerized applications Applications to-be retained (not to-be containerized) Then, we migrated the applications in phases as follows, Phase 1 – Two representative applications for identified capabilities: In the first phase, we selected a few representative applications that were good candidates for migration to Amazon EKS. This migration was successfully completed, allowing us to validate critical features, such as gRPC, websockets, and load balancing. It helped the team learn Docker, Kubernetes, and Amazon EKS. The team understood the nuances and became confident in migrating and running containers. Phase 2 – Two mission-critical applications: In the second phase, we focused on the top two mission-critical applications. These applications provide the exchange’s data feed to end-users in real-time and also power the charts and graphs. These applications are critical for end-users to make trading decisions. The team resolved all the challenges in migrating and running these mission-critical applications and learned the specifics of running mission-critical applications on Amazon EKS. This experience streamlined the later phases, as the majority of the use cases for forthcoming applications had already been addressed through the initial two representative and two mission-critical applications. Phase 3 – Top cost contributing applications: In the third phase, we picked up the top cost contributing applications, such as complex applications. There were multiple benefits. Apart from realizing the cost savings early, we increased the agility, resiliency, scalability, and performance of these applications. Phase 4 – Remaining to-be containerized applications: In the fourth phase, we migrated the rest of the applications to Amazon EKS. With the learnings and insights we gained in the earlier phases, we were able to expedite the migration process significantly. We evaluated the applications for the fifth phase and concluded that containerization did not present the cost benefit ratio due to various factors, such as the efforts involved as opposed to the advantages and future strategic plans for these applications. Therefore, we retained these applications as is. Greenfield differentiation Scaling based on Traffic Pattern: Upstox did an in-depth analysis of daily traffic pattens. On a typical day, there’s an exponential spike in traffic during the market’s opening hour, approximately from 9:00 AM to 9:30 AM. Then, there is another surge in the traffic before market closing hour. The traffic varies throughout the trading hours. Whereas, traffic is very less during non-trading hours. This traffic pattern is shown in the following diagram. Scaling based on Traffic Pattern However, during a non-typical day, traffic varies based on market news, events such as budget day announcements. Considering these aspects, Upstox Implemented scaling based on multiple dimensions like time-based, request-based, and Memory/CPU based scaling. To further reduce/eliminate the scaling lag, we leveraged the technique “Eliminate Kubernetes node scaling lag with pod priority and over-provisioning”. Application Load Balancer pre-warming: Since there is a sudden spike in the traffic in a short span of time during market opening hours, the Application Load Balancer (ALB) also needs to scale accordingly. The ALB can scale to handle up to double the traffic in the next five minutes. However, the increase in the traffic during market opening hours is much higher than ALB can handle. Therefore, to address this challenge, Upstox has been using ALB pre-warming. With pre-warming, the higher capacity (LSU) is pre-provisioned to handle the sudden spike in the traffic. Karpenter: Upstox implemented Kubernetes Karpenter autoscaler instead of cluster autoscaler to scale the Amazon EKS worker nodes in-line with the traffic patterns. We used the following key features of Karpenter: Instance type selection:We used a broad set of EC2 instance types to mitigate the risk of instance unavailability and to optimize the cost. Node-recycling: The Amazon EKS cluster’s worker nodes use Amazon EKS optimized Amazon Linux AMIs to run containers securely and performantly. To make sure that the worker nodes are always up-to-date with the latest security patches and fixes, we adopted a routine cycle of replacing them with the most recent Amazon EKS-optimized Amazon Machine Images (AMIs). Therefore, by replacing the worker node instead of doing in-place OS updates of the existing worker node, we aligned with the best practice of treating the infrastructure as immutable. We used the following NodePool parameter to continuously replace the node with the latest Amazon EKS optimized AMI. spec.disruption.expireAfter Consolidation: We further optimized the cost by scaling-in the worker nodes in-line with the load. We used the following NodePool parameter to downsize the over-provisioned nodes and to reduce the number of nodes, as the load is decreased and pods are removed. spec.disruption.consolidationPolicy Security: The Greenfield platform is highly secured with a focus on data security, data privacy, access control, and compliance. Upstox leveraged AWS encryption SDK to manage PII data. Upstox leveraged AWS Systems Manager extensively as follows: Systems Manager – Session Manager: We managed the SSH access using Systems Manager – Session Manager and followed the least access principle using IAM policies and roles. Therefore, we eliminated the sharing of SSH keys. Systems Manager – Patch Manager: We also used the Systems Manager – Patch Manager to periodically apply the patches to the EC2 instances outside of Amazon EKS clusters. Systems Manager – Compliance: We used the Systems Manager – Compliance to generate the audit reports such as patch compliance data. Additional Cost Optimizations: To further optimize the cost, we followed these advanced cost optimization techniques: Right Sizing: As we containerized the applications, based on the load and performance testing, we chose the right size and type of the EC2 instances to get the best performance at the lowest cost. AMD powered instances for applications: AMD-powered EC2 instances give customers the ability to run general purpose, memory intensive, burstable, compute intensive, and graphics intensive workloads, all at a significant price advantage relative to comparable offerings. By adopting AMD powered Amazon EKS EC2 instances, Upstox achieved an impressive cost reduction of approximately 40-45% in the AWS Mumbai Region. Spot instances: EC2 Spot instances provide up to 90% cost savings as compared to on-demand instances. Therefore, Upstox migrated Amazon EKS Dev and Staging environments to Spot instances with fallback to on-demand instances if Spot instances were unavailable. This resulted in an overall 70% cost savings for dev and staging environments. Graviton instances for managed services: AWS Graviton processor offers upto 40% cost performance benefits and most of the managed services can be updated to use Graviton processor. Therefore, we leveraged graviton processor for managed services such as Amazon ElastiCache and Amazon Aurora. Amazon EBS gp3: Amazon EBS gp3 volumes offer up to 20% lower cost than previous generation gp2 volumes and also allow scaling of the performance independent of storage capacity. Therefore, we migrated all the EBS volumes from gp2 to gp3 to lower the cost by approximately by 18% and to get better performance. Amazon S3: With exponential growth in the business, the data and cost also increased significantly. Upstox leveraged S3 storage lens to optimize the Amazon Simple Storage Service (Amazon S3) cost as outlined in the case study “Upstox Saves $1 Million Annually Using Amazon S3 Storage Lens” Architectural Roadmap As part of the architecture evolution, Upstox is considering the following roadmap: Multi-architecture CPU: Upstox is considering a multi-architecture CPU, x86 and ARM, for all of their workloads by leveraging Graviton instances along with AMD instances. This diversification makes sure of a wider selection of Graviton instances, mitigating the risk of instance shortages. For example, if AMD instances are not available, then the workload scales using Graviton instances and the other way around. In turn, this improves the resiliency while keeping the cost optimized. Auto-scaling for databases: Currently databases are provisioned for peak capacity and there is no auto-scaling. Upstox is planning to implement auto-scaling for databases read-replicas so that the number of read replicas are automatically adjusted based on the load and to further optimize the cost. Combination of on-demand and Spot instances for production workload: Currently the production environment uses on-demand instances and the plan is to have a combination of on-demand and spot instances. This can further reduce the cost. Conclusion In this post, we showed the detailed journey of Upstox in developing a Greenfield platform, a transformative project that has significantly enhanced customer experiences, agility, security, and operational efficiency, all while reducing costs. The Platform has not only shortened the release lifecycle, enabling faster delivery of new use cases, but also managed to lower operational costs despite handling increased volumes. This was achieved without compromising on security, scalability, or performance. The Upstox initiative stands as a testament to how thoughtful innovation and strategic investment in technology can lead to substantial business benefits. View the full article

April 29
- amazon eks
- spot instances
- (and 1 more)
  Tagged with:

karpenter Karpenter graduates to beta

Amazon Web Services posted a topic in Kubernetes & Container Orchestration

Introduction Karpenter is a Kubernetes node lifecycle manager created by AWS, initially released in 2021 with the goal of minimizing cluster node configurations. Over the past year, it has seen tremendous growth, reaching over 4900 stars on GitHub and merged code from more than 200 contributors. It is in the process of being donated to the Cloud Native Computing Foundation (CNCF) as part of Kubernetes Autoscaling Special Interest Group. As part of this growth, there is a growing need for Karpenter’s APIs to mature, to offer more stringent stability guarantees to users that don’t want to deal with the number of breaking changes the project has made in it’s alpha state. This marks a significant milestone in the project’s evolution. With this transition, customers benefits from the increased level of maturity and API stability that the beta version offers. This also marks a commitment from us to prioritize backward compatibility, which means customers can confidently adopt new features and enhancements without the worry of disruptive changes down the line. This release, like previous releases, incorporates feedback from the open-source community. The API changes are being rolled out as part of Karpenter version 0.32.0 release. Existing deployments need to be upgraded to this version, following the migration path outlined in this post and further detailed in the Karpenter upgrade guide. Existing alpha APIs are now deprecated and remain available in this single version. Starting with the release v0.33.0, Karpenter will only support its v1beta1 APIs. Karpenter APIs follow a maturity progression of alpha → beta → stable. The graduation from alpha to beta required significant changes to the APIs, which are highlighted in this post. We don’t anticipate the graduation from beta to stable to require the same level of changes. If you’re curious about the Kubernetes APIs graduation process, then please see this post. What is changing In the journey to get to a stable v1, we’ve made significant changes to our APIs from alpha to beta to improve the ease-of-use by dropping areas of the APIs that commonly gave users problems. One of these areas was naming, where we saw confusion the use of the word provisioner (i.e., overloaded term in the realm of storage) and generally wanted to reduce the number of concepts that users had to reason about. With this release, Karpenter deprecates the Provisioner, AWSNodeTemplate and Machine APIs, while introducing NodePool, EC2NodeClass, and NodeClaim. We’ve taken an holistic view and streamlined the APIs around the single concept of Node. Walkthrough API group and kind naming The v1beta1 version introduces the following new APIs while deprecating the existing ones: karpenter.sh/Provisioner becomes karpenter.sh/NodePool karpenter.sh/Machine becomes karpenter.sh/NodeClaim karpenter.k8s.aws/AWSNodeTemplate becomes karpenter.k8s.aws/EC2NodeClass Each of these naming changes comes with schema changes that need to be considered as you update to the latest version of Karpenter. Let’s look at each change and what the new API definition looks like. v1alpha5/Provisioner → v1beta1/NodePool NodePool serves as the successor to Provisioner, exposing configuration-based parameters that impact the compatibility between Nodes and Pods during scheduling (such as requirements, taints, and labels). It also encompasses behavior-based settings for fine-tuning Karpenter’s scheduling and deprovisioning decision-making. A pool resolves to a mix of instance types and sizes, while still enforcing limits on how workloads request resources. It facilitates the grouping of provisioning and deprovisioning behavior. Importantly, a pool shouldn’t have any cloud-specific configurations to maintain a portable configuration. In Karpenter v1beta1, all non-behavioral fields are encapsulated within the NodePool template field. In the case of Karpenter, NodePool template NodeClaims, which are then orchestrated by the Karpenter controller. This mirrors the concept of Deployments, where Pods are templated and orchestrated by the deployment controller. You can read more about NodePools in the documentation. An example NodePool looks something like this: apiVersion: karpenter.sh/v1beta1 kind: NodePool ... spec: template: metadata: annotations: custom-annotation: custom-value labels: team: team-a custom-label: custom-value spec: nodeClassRef: name: default requirements: - key: karpenter.k8s.aws/instance-category operator: In values: ["c", "m", "r"] ... kubelet: systemReserved: cpu: 100m memory: 100Mi ephemeral-storage: 1Gi maxPods: 20 disruption: expireAfter: 360h consolidationPolicy: WhenUnderutilized When you look at the example specification, you’ll notice a new section called disruption. This groups the previous settings (ttlSecondsAfterEmpty, ttlSecondsUntilExipred, and consolidation.enabled) for consolidation, expiration, and empty nodes. Karpenter set defaults for the disruption configuration if it isn’t specified when applying the NodePool manifest. The default values are highlighted below and you can read more about the behavior of these fields in the documentation. Field Default spec.disruption.consolidationPolicy WhenUnderutilized spec.disruption.expireAfter 720h v1alpha1/AWSNodeTemplate → v1beta1/EC2NodeClass EC2NodeClass serves as the successor to AWSNodeTemplate, exposing cloud provider specific fields that affect launch and bootstrap process for that Node including: configuring Amazon Machine Image (AMI), security groups, and subnets you want to use as well as details about block storage, user-data, and Instance Metadata settings. The Karpenter spec.instanceProfile field has been removed from the EC2NodeClass in favor of the spec.role field. Karpenter now auto-generates the instance profile in your EC2NodeClass given the role that you specify. The spec.launchTemplateName field, which was already deprecated, for referencing unmanaged launch templates within Karpenter has been removed. If you are still using it, then you need to migrate it to the Karpenter-managed versions using EC2NodeClass. You can read more about NodeClass in the documentation. An example EC2NodeClass looks something like this: apiVersion: compute.k8s.aws/v1beta1 kind: EC2NodeClass metadata: name: default spec: amiFamily: Bottlerocket role: KarpenterNodeRole-karpenter-demo subnetSelectorTerms: - tags: karpenter.sh/discovery: karpenter-demo securityGroupSelectorTerms: - tags: karpenter.sh/discovery: karpenter-demo tags: test-tag: test-value v1alpha5/Machine→ v1beta1/NodeClaim In Karpenter v0.28.0, a new type called Machine was added. This allowed for multiple node provisioning improvements that allowed native Kubernetes controllers to join nodes to the cluster and still lets Karpenter manage and track the node. If you’re on a version of Karpenter before v0.28.0, then you won’t have this resource type. With the release v0.32.0 this has changed to NodeClaim. NodeClaims aren’t intended to be created by cluster operators, and instead they’re created and deleted by Karpenter. You shouldn’t have to make any changes for NodeClaims to work, but if you’re troubleshooting a node in a cluster this is a great place to see the lifecycle and health of a node as Karpenter manages it. Labels changes Karpenter v1beta1 introduces changes to the common labels karpenter.sh/do-not-evict and karpenter.sh/do-not-consolidate, which have been deprecated and unified under the single label karpenter.sh/do-not-disrupt. These can be applied to both Pods and Nodes and prevents Karpenter from performing node disruption and Pod eviction. More flexible selectors for AMIs, subnets, and security groups in NodeClass The current selector settings have been somewhat limiting in their capacity to identify and use different settings for nodes being provisioned. The existing behavior would apply AND logic, which made it harder to match settings in various clusters and regions. To address this, we’ve extended the selectors, which gives you the capability to specify multiple terms. These terms are now combined using an OR logic, which means they’re evaluated together until a match is identified. An example for matching an AMI with name my-name1 or my-name-2, and owner 123456789 or amazon would look like this: amiSelectorTerms: - name: my-name1 owner: 123456789 - name: my-name2 owner: 123456789 - name: my-name1 owner: amazon - name: my-name2 owner: amazon Similar settings can be made for subnetSelectorTerms and securityGroupSelectorTerms, which you can read more about in the Karpenter documentation. securityGroupSelectorTerms: - id: abc-123 name: default-security-group # Not the same as the name tag tags: key: value # Selector Terms are ORed - id: abc-123 name: custom-security-group # Not the same as the name tag tags: key: value Drift enabled by default Starting from the next release (v0.33.0), the drift feature will be enabled by default. If you don’t specify the Drift featureGate, the feature is assumed to be enabled. You can disable the drift feature by specifying --feature-gates DriftEnabled=false in the command line arguments to Karpenter. This feature gate is expected to be fully dropped when core APIs (NodePool, NodeClaim) are bumped to v1. Migration path Update Karpenter controller AWS IAM role The Karpenter controller uses an AWS Identity and Access Management (AWS IAM) role to grant the permissions to launch and operate Amazon Elastic Compute Cloud (Amazon EC2) instances in your AWS account. As part of the upgrade, create a new permission policy by adding the following: Add to the ec2:RunInstances, ec2:CreateFleet, and ec2:CreateLaunchTemplate permissions scoped down to the tag-based constraint karpenter.sh/nodepool instead of the previous tag key karpenter.sh/provisioner-name. Grant permissions to the actions iam:CreateInstanceProfile, iam:AddRoleToInstanceProfile, iam:RemoveRoleFromInstanceProfile, iam:DeleteInstanceProfile, and iam:GetInstanceProfile. All of these permissions (with the exception of the GetInstanceProfile permission) are constrained by tag-based policy that ensures that the controller only has permission to operate on instance profiles that it was responsible for creating. These are needed to support the Karpenter-managed instance profiles. One the migration is completed, and you’ve rolled out the new nodes as described in the following details, you can safely remove the previous permission policy. An example of the permission policy is available in the Karpenter GitHub repository and it is distributed as part of the project getting started AWS CloudFormation template. API migration To transition from the alpha to the new v1beta1 APIs, you should first install the new v1beta1 Custom Resource Definitions (CRDs). Subsequently, you need to generate the beta equivalent of each alpha API for both Provisioners and AWSNodeTemplates. It’s worth noting that the migration from Machine to NodeClaim is managed by Karpenter as you transition your CustomResources from Provisioners to NodePools and remains seamless for users. We’re happy to introduce karpenter-convert, which is a command line utility designed to streamline the creation of NodePool and EC2NodeClass objects. In the following, you’ll find the steps to effectively utilize this tool: Install the command line utility: go install github.com/aws/karpenter/tools/karpenter-convert/cmd/karpenter-convert@latest Migrate each provisioner into a NodePool: karpenter-convert -f provisioner.yaml > nodepool.yaml Migrate each AWSNodeTemplate into an EC2NodeClass: karpenter-convert -f nodetemplate.yaml > nodeclass.yaml For each EC2NodeClass generated by the tool you need to manually specify the AWS role. The tool leaves a placeholder $KARPENTER_NODE_ROLE, which you need to replace with your actual role name. For each Provisioner resource, you need to decide whether you want to roll nodes one-at-a-time or roll all Provisioner nodes all-at-once. A detailed step-by-step guide is provided in the following section: Periodic rolling with drift With drift enabled, for each Provisioner in your cluster, perform the following actions: Migrate your alpha CRDs to v1beta1 Add a taint to the old Provisioner such as karpenter.sh/legacy=true:NoSchedule Karpenter drift marks all machines/nodes owned by that Provisioner as drifted Karpenter drift launches replacements for the nodes in the new NodePool resource Currently, Karpenter only supports rolling of one node at a time, which means that it may take some time for Karpenter to completely roll all nodes under a single Provisioner Forced deletion For each Provisioner in your cluster, perform the following actions: Create a NodePool/EC2NodeClass in your cluster that is the v1beta1 equivalent of the v1alpha5 Provisioner/AWSNodeTemplate Delete the old Provisioner with kubectl delete provisioner <provisioner-name> --cascade=foreground Karpenter deletes each Node that is owned by the Provisioner, draining all nodes simultaneously and launches nodes for the newly pending pods as soon as the Nodes enter a draining state Manual rolling For each Provisioner in your cluster, perform the following actions: Create a NodePool/EC2NodeClass in your cluster that is the v1beta1 equivalent of the v1alpha5 Provisioner/AWSNodeTemplate Add a taint to the old Provisioner such as karpenter.sh/legacy=true:NoSchedule Delete each node one-at-time owned by the Provisioner by running kubectl delete node <node-name> Conclusion In this post, we showed you the modifications introduced by the new APIs and provided insight into the reasoning behind these changes, which have been shaped by feedback from the community. We’re thrilled to witness the growing maturity of the Karpenter project. We anticipate that the majority of these changes will eventually move to the stable v1 API, which enables a broader user base to take full advantage of Karpenter’s capabilities in workload-native node provisioning. There are some other deprecations and changes that we didn’t cover in this post. Please head to the Karpenter upgrade guide for a comprehensive migration guideline. Before you upgrade Karpenter to v0.32.0, it is recommended to read the full release notes. If you have any questions, then please feel free to reach out in the Kubernetes slack #karpenter channel or on GitHub where we welcome feedback that helps us prioritize and develop new features. View the full article

karpenter Manage scale-to-zero scenarios with Karpenter and Serverless

Amazon Web Services posted a topic in Kubernetes & Container Orchestration

Introduction Cluster autoscaler, has been the de facto industry standard autoscaling mechanism on kubernetes since the very early version of the platform. However, with the evolving complexity and number of containerized workloads, our customers running on Amazon Elastic Kubernetes Service (Amazon EKS) started to ask for a more flexible way to allocate compute resources to pods and flexibility in instance size and heterogeneity. We addressed those needs with karpenter, a product that automatically launches just the right compute resources to handle your cluster’s applications. Karpenter is designed to take full advantage of Amazon Elastic Compute Cloud (Amazon EC2). Although serving the same purpose, cluster autoscaler and karpenter take a very different approach to autoscaling. In this post, we won’t focus on the differences of the two solutions, but instead we’ll analyze how those can be used to fulfill a specific requirement — scaling an Amazon EKS cluster to zero nodes. Scaling an Amazon EKS cluster to zero nodes can be useful for a variety of reasons. For example, you might want to scale your cluster down to zero nodes when there is no traffic, or you might want to scale your cluster down to zero nodes when you are performing maintenance. This not only reduces costs, but increases the sustainability of resource utilization. Solution overview Cost considerations of scaling down clusters The cost optimization pillar of the AWS Well-Architected Framework includes a specific section that focuses on the financial advantages of implementing a just-in-time supply strategy. Autoscaling is often the preferred approach for matching supply with demand. Figure 1: Adjusting capacity as needed. Autoscaling in Amazon EKS When it comes to Amazon EKS, we need to think of control plane autoscaling and data plane autoscaling as two separate concerns. When Amazon EKS launched in 2018, the goal was to reduce users’ operational overhead by providing a managed control plane for kubernetes. Initially, this included automated upgrades, patches, and backups, but with fixed capacity. An Amazon EC2-backed data plane (with the exception of AWS Fargate) is not fully managed by AWS. Managed node groups reduce the operational burden by automating the provisioning and lifecycle management of nodes. However, upgrades, patches, backups, and autoscaling are the responsibility of the user. In this post, we’ll cover data plane autoscaling, and more specifically, since there are different ways to run Amazon EKS nodes — using Amazon EC2 instances, AWS Fargate, or using AWS Outposts. In this post, we’ll focus on Amazon EKS nodes running on Amazon EC2. Before we go any further, let’s take a closer look at how kubernetes traditionally handles autoscaling for pods and nodes. Autoscaling pods In kubernetes, pods autoscaling is tackled via the Horizontal Pod Autoscaler (HPA), which automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand. Horizontal scaling means that the response to increased load is to deploy more pods. This is different from vertical scaling, which for kubernetes, means assigning more resources (e.g., memory or central process units [CPUs]) to the pods that are already running for the workload. Figure 2: Autoscaling pods with the Horizontal Pod Autoscaler. When the load decreases and the number of pods is above the configured minimum, the Horizontal Pod Autoscaler instructs the workload resource (i.e., the deployment, StatefulSet, or other similar resource) to scale back in. However, Horizontal Pod Autoscaler does not natively support scaling down to 0. There are a few operators that allow you to overcome this limitation by intercepting the requests coming to your pods, or by checking some specific metrics, such as Knative or Keda. However, these are sophisticated mechanisms for achieving serverless behaviour and are beyond the scope of this post on schedule-based scaling to 0. Autoscaling nodes In kubernetes, nodes autoscaling can be addressed using the cluster autoscaler, which is a tool that automatically adjusts the size of the kubernetes cluster when one of the following conditions is true: there are pods that failed to run in the cluster due to insufficient resources. there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes. Figure 3: Autoscaling nodes with the Cluster Autoscaler. Cluster autoscaler decreases the size of the cluster when some nodes are consistently unneeded for a set amount of time. A node is unnecessary when it has low utilization and all of its important pods can be moved elsewhere. When it comes to Amazon EC2-based nodegroups (assuming their minimum size is set to 0) the cluster autoscaler scales the nodegroup to 0 if there are no pods preventing the scale in operation. Pricing model and cost considerations For each Amazon EKS cluster, you pay a basic hourly rate to cover for the managed control plane as well as the cost of running the Amazon EC2-backed data plane and any associated volumes. Hourly Amazon EC2 costs vary depending on the size of the data plane and the underlying instance types. While we would continue to pay the hourly rate for the control plane for the non-production clusters that are used for testing or quality assurance purposes, we may not need the data plane to be available 24 hours a day including weekends. By establishing a schedule-based approach to scale the nodegroups to 0 when unneeded, we can significantly optimize the overall Amazon EC2 compute costs. Cost savings can go beyond bare Amazon EC2 costs. For example, if you use Amazon CloudWatch container insights for monitoring, then you would not be charged when nodes are down given that the costs associated with metrics ingestion are prorated by the hour. In this post, we’ll show you how you can achieve schedule-based scale to 0 for your data plane with Horizontal Pod Autoscaler (HPA) and cluster autoscaler as well as with karpenter. Current mechanisms to scale to zero using HPA and cluster autoscaler We have seen how kubernetes traditionally handles autoscaling for both pods and nodes. We’ve also seen how the current implementations of Horizontal Pod Autoscaler can’t handle schedule-based scale to 0 scenarios. However, the native capabilities can be supplemented with dedicated Kubernetes CronJobs or community-driven open source solutions like cron-hpa or kube downscaler, which can scale pods to 0 on specific schedules. Additionally, we need to make sure that not only we can scale in to 0 but that we can also scale out from 0. Since kubernetes version 1.24, a new feature has been integrated in cluster autoscaler, which makes this easier. Quoting the official announcement: For Kubernetes 1.24, we have contributed a feature to the upstream Cluster Autoscaler project that simplifies scaling the Amazon EKS managed node group (MNG) to and from zero nodes. Before, you had to tag the underlying EC2 Autoscaling Group (ASG) for the Cluster Autoscaler to recognize the resources, labels, and taints of an MNG that was scaled to zero nodes. Starting with kubernetes version 1.24, when there are no running nodes in the MNG and the cluster autoscaler calls the Amazon EKS DescribeNodegroup API to get the information it needs about MNG resources, labels, and taints. When the value of a cluster autoscaler tag on the ASG powering an Amazon EKS MNG conflicts with the value of the MNG itself, the cluster autoscaler prefers the ASG tag so that customers can override values as necessary. Thanks to this new feature, the cluster autoscaler determines which nodegroup needs to be scaled out from 0 based on the definition of the unschedulable pods, but in order for it to be able to do so, it must be up and running. In other words: we cannot scale all of our nodegroups to 0 as we do need to guarantee a minimal stack of core components to be constantly up and running. Such a stack would include, at the very least: the cluster autoscaler, the Core DNS, and the open-source tool of our choice to cover schedule-based scaling of pods. Ideally, we might also need to accommodate Cluster Proportional Autoscaler (CPA) to address Core DNS scalability. To be cost efficient, we might decide to create a dedicated nodegroup for the core components, which would be backed by cheap instance types, and separate nodegroups for applicative workloads. Putting it all together: Kube downscaler or cron-hpa apply a schedule-based scaling to or from 0 for applicative workloads. Cluster autoscaler notices if nodes can be scaled in (including to 0) as underutilized or that some pods cannot be scheduled due to insufficient resources and nodes need to scale out (including from 0). Cluster autoscaler interacts with the AWS ASG API (Application Programming Interface) to terminate or provision new nodes. The nodegroup is scaled to or from 0 as expected. Figure 4: Schedule-based scale to 0 using an EC2 backed technical nodegroup for core components. Eventually, this pattern can be further optimized by moving the minimal stack of core components to AWS Fargate. This means that not a single Amazon EC2 instance is running when the data plane is unneeded. The cost implications of hosting the core components in AWS Fagate must be carefully assessed. Keeping the lower-cost Amazon EC2 instance types may result in a less elegant but more cost-effective solution. Figure 5: Schedule-based scale to 0 using a Fargate profile for core components. How it is done with karpenter With karpenter, we have the concept of provisioner. Provisioners set constraints on the nodes that can be created by parpenter and the pods that can run on those nodes. With the current version of karpenter (0.28.x), there are three ways to scale down the number of nodes to zero using provisioners: Delete all provisioners. Deleting provisioners causes all nodes to be deleted. This option is the simplest to implement, but it may not be feasible in all situations. For example, if you have multiple tenants sharing the same cluster, you may not want to delete all provisioners, as this would prevent any tenants from running workloads. Scale all workloads to zero. Karpenter then deletes the unused nodes. This option is more flexible than deleting all provisioners, but it may not be ideal if your workloads are managed by different team and might be difficoult to implement in a GitOps setup. Add a zero CPU limit to provisioners and then delete all nodes. This option is the most flexible, as it allows you to keep your workloads running while still scaling down the number of nodes to zero. To do this, you need to update the spec.limits.cpu field of your provisioners. The first two options previously described may be difficult to implement in multi-tenant configurations or using GitOps frameworks. Therefore, this post focuses on the third option. Walkthrough Technical considerations Programmatically scaling provisioner limits to zero can be done in a number of ways. One common pattern is to use kubernetes CronJobs. For example, the following Cronjob scales the provisioner limits to zero every work day at 10.30 PM: --- apiVersion: batch/v1 kind: CronJob metadata: name: scale-down-karpenter spec: schedule: "30 22 * * 1-5" jobTemplate: spec: [...] command: - /bin/sh - -c - | kubectl patch provisioner test-provisioner --type merge --patch '{"spec": {"limits": {"resources": {"cpu": "0"}}}}' && echo "test-provisioner patchd at $(date)"; [...] This job runs every night at 10.30 PM and scales the provisioner’s limits to zero, which effectively disables the creation of new nodes until it is manually scaled back up. CronJobs can be used with AWS Lambda to terminate running nodes, or to implement more complex logic such as scaling other infrastructure components, handling errors and notifications, or any event-driven pattern that can be connected to an application or workload. AWS Step Functions can add an additional layer of orchestration to this, allowing you to interact with your cluster using the kubernetes API and run jobs as part of your application’s workflow. More information on how to use the kubernetes API integrations with AWS Step Functions can be found here. This is a simplified example of an AWS Lambda function that can be used to terminate the remaining karpenter nodes: def lambda_handler(event, context): [...] filters = [ {'Name': 'instance-state-name','Values': ['running']}, {'Name': f'tag:{"karpenter.sh/provisioner-name"}', 'Values':"example123"}, {"Name": "tag:aws:eks:cluster-name", "Values": "example123"} ] try: instances = ec2.instances.filter(Filters=filters) RunningInstances = [instance.id for instance in instances] except botocore.exceptions.ClientError as error: logging.error("Some error message here") raise error if len(RunningInstances) > 0: for instances in RunningInstances: logging.info('Found Karpenter node: {}'.format(instances)) try: ec2.instances.filter(InstanceIds=RunningInstances).terminate() except botocore.exceptions.ClientError as error: logging.error("Some error message here") raise error [...] Note: these steps can be difficult to orchestrate in a GitOps setup. The general advise is to create specific conditions for provisioner limits. This is (purely) as example, how this can be done with ArgoCD: apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: karpenter namespace: argocd spec: ignoreDifferences: - group: karpenter.sh kind: Provisioner jsonPointers: - /spec/limits/resources/cpu How to move core components to AWS Fargate for further optimization Karpenter and cluster autoscaler run a controller inside a pod running on the cluster. This controller needs to be up and running to orchestrate scale operations up or down. This means that at least one node should be running on the cluster to host those controllers. However, if you are interested in scale-to-zero scenarios, there is an option that should be taken into consideration: AWS Fargate. AWS Fargate is a serverless compute engine that allows you to run containers without having to manage any underlying infrastructure. This means that you can scale your application up and down as needed, without having to worry about running out of resources. AWS Fargate profiles that run karpenter can be configured via AWS Command Line Interface (AWS CLI), AWS Management Console, CDK (Cloud Development Kit), Terraform, AWS CloudFormation, and eksctl. The following example shows how to configure those profiles with ekstcl: apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: <cluster-name> region: <aws-region> fargateProfiles: [...] - name: karpenter podExecutionRoleARN: arn:aws:iam::12345678910:role/FargatePodExecutionRole selectors: - labels: app.kubernetes.io/name: karpenter namespace: karpenter subnets: - subnet-12345 - subnet-67890 - name: karpenter-scaledown podExecutionRoleARN: arn:aws:iam::12345678910:role/FargatePodExecutionRole selectors: - labels: job-name: scale-down-karpenter* namespace: karpenter subnets: - subnet-12345 - subnet-67890 [...] Note: By default, CoreDNS is configured to run on Amazon EC2 infrastructure on Amazon EKS clusters. If you want to only run your pods on AWS Fargate in your cluster, then refer to the Getting started with AWS Fargate using Amazon EKS guide. Conclusions In this post, we showed you how to scale your Amazon EKS clusters to save money and reduce your environmental impact. By using cluster autoscaler and karpenter, you can easily and effectively scale your clusters up and down, as needed. These tools can help you to scale your Amazon EKS clusters to zero nodes and save on your resource utilization and carbon footprint. If you want to get started with karpenter, then you can find the official documentation here. The documentation includes instructions on Kubernetes installation and the configuration of provisioners and all the other components required to orchestrate autoscaling. This guide focuses on Amazon EKS, but the same concepts can apply on self hosted kubernetes solutions. View the full article

October 18, 2023
- scalability
- serverless
- (and 2 more)
  Tagged with:

karpenter Autoscaling Kubernetes Worker Nodes With Karpenter

DZone posted a topic in Kubernetes & Container Orchestration

Karpenter aims to enhance both the effectiveness and affordability of managing workloads within a Kubernetes cluster. The core mechanics of Karpenter involve: Monitoring unscheduled pods identified by the Kubernetes scheduler. Scrutinizing the scheduling constraints, including resource requests, node selectors, affinities, tolerations, and topology spread constraints, as stipulated by the pods. Provisioning nodes that precisely align with the pods’ requirements. Streamlining cluster resource usage by removing nodes once their services are no longer required. In this article, I talk about how to set up and use Karpenter for managing worker nodes in EKS... View the full article

karpenter drift How to upgrade Amazon EKS worker nodes with Karpenter Drift

Amazon Web Services posted a topic in Kubernetes & Container Orchestration

Karpenter is an open-source cluster autoscaler that provisions right-sized nodes in response to unschedulable pods based on aggregated CPU, memory, volume requests, and other Kubernetes scheduling constraints (e.g., affinities and pod topology spread constraints), which simplifies infrastructure management. In this post, we’ll describe the mechanism for patching Kubernetes worker nodes provisioned with Karpenter through a gated Karpenter feature called Drift. If you have many worker nodes across multiple Amazon EKS clusters, then this mechanism can help you continuously patch at scale… View the full article

October 4, 2023
- 1
- eks
- karpenter
- (and 1 more)
  Tagged with:
  - eks
  - karpenter
  - kubernetes

karpenter How H2O.ai optimized and secured their AI/ML infrastructure with Karpenter and Bottlerocket

Amazon Web Services posted a topic in Kubernetes & Container Orchestration

H2O.ai is a visionary leader in democratizing artificial intelligence (AI) by rapidly provisioning AI platforms that help businesses make better decisions. Our company’s SaaS platform, built on AWS, H2O AI Managed Cloud, enables businesses to build productive models and gain insights from their data quickly and easily. H2O.ai’s platform uses data and technology to improve: Speed: Our platform helps businesses to quickly develop and deploy AI/ML models, which leads to faster time to market and improved decision-making. Accuracy: Our platform uses a variety of techniques to improve the accuracy of AI/ML models, including automatic feature engineering and model selection. Scalability: Our platform can be scaled to handle large datasets and complex problems. Explainability: Our platform provides insights into how AI/ML models make decisions, which can help businesses to trust and adopt these models. This post demonstrates how we used Karpenter, an AWS open-sourced just-in-time Kubernetes autoscaler, and Bottlerocket, a secure, lightweight, purpose-built Linux-based operating system to run containers in the Amazon Elastic Kubernetes Service (Amazon EKS) clusters. This combined functionality, along with prefetching container images, helped us improve the compute provisioning and configuration time for our ML workloads by 100-fold... View the full article

July 31, 2023
- ai
- ml
- (and 1 more)
  Tagged with:
  - ai
  - ml
  - bottlerocket

karpenter How CoStar uses Karpenter to optimize their Amazon EKS Resources

Amazon Web Services posted a topic in Kubernetes & Container Orchestration

CoStar is well known as a market leader for Commercial Real Estate data, but they also run major home, rental, and apartments websites —including apartments.com—that many have seen advertised by Jeff Goldblum. CoStar’s traditional Commercial Real Estate customers are highly informed users that use large and complex data to make critical business decisions. Successfully helping customers analyze and decide which of the 6 million properties with 130 billion sq. ft. of space to rent, has made CoStar a leader in data and analytics technology. When CoStar began building the next generation of their Apartments and Homes websites, it became clear the user profile and customer demands had important differences from their long running Commercial Real Estate customers. CoStar needed to deliver the same decision-making value to their new customer base, but for magnitudes more customers and data. This initiated CoStar’s migration from their legacy data centers into AWS for speed and elasticity needed to deliver the same value for millions of users accessing hundreds of millions of properties... View the full article

karpenter Scaling Kubernetes with Karpenter: Advanced Scheduling with Pod Affinity and Volume Topology Awareness

Amazon Web Services posted a topic in Kubernetes & Container Orchestration

Cloud-native technologies are becoming increasingly ubiquitous, and Kubernetes is at the forefront of this movement. Today, Kubernetes is seeing widespread adoption across organizations in a variety of different industries. When implemented properly, Kubernetes can help these organizations achieve higher availability, scalability, and resiliency for their workloads. Combining Kubernetes with the attributes of cloud computing—such as unparalleled scalability and elasticity—can help organizations enhance their containerized applications’ resiliency and availability. As detailed in this introductory post, Karpenter‘s objective is to make sure that your cluster’s workloads have the compute they need, no more and no less, right when they need it. In its most recent updates, Karpenter added support for more advanced scheduling constraints, such as pod affinity and anti-affinity, topology spread, node affinity, node selection, and resource requests. This post will specifically delve into podAffinity, podAntiAffinity, and volume topology awareness and elaborate on the use cases that they’re best suited for... View the full article

July 18, 2022
- kubernetes
- scheduling
- (and 2 more)
  Tagged with:

karpenter Amazon Elastic Kubernetes Service (EKS) announces Karpenter v0.9.0 with support for Pod Affinity

Amazon Web Services posted a topic in Kubernetes & Container Orchestration

Amazon Elastic Kubernetes Service (EKS) is announcing v0.9.0 of the Karpenter open-source cluster autoscaling project. Karpenter is a flexible, high-performance Kubernetes cluster autoscaler that helps improve application availability and resource utilization. Karpenter v0.9.0 adds supports for Kubernetes podAffinity and podAntiAffinity scheduling constraints, which increases its compatibility with popular third-party Helm charts and expands support for high-availability use cases. View the full article

April 26, 2022
- amazon eks
- kubernetes
- (and 2 more)
  Tagged with:

Sign In

Search the Community

Search By Tags

Search By Author

Content Type

Forums

Calendars

Find results in...

Find results that contain...

Date Created

Start

End

Last Updated

Start

End

Filter by number of...

Minimum number of comments

Minimum number of replies

Minimum number of reviews

Minimum number of views

Joined

Start

End

Group

Website URL

LinkedIn Profile URL

About Me

Cloud Platforms

Cloud Experience

Development Experience

Current Role

Skills

Certifications

Favourite Tools

Interests

Forum Statistics