Jump to content

Search the Community

Showing results for tags 'karpenter'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • General
    • General Discussion
    • Artificial Intelligence
    • DevOpsForum News
  • DevOps & SRE
    • DevOps & SRE General Discussion
    • Databases, Data Engineering & Data Science
    • Development & Programming
    • CI/CD, GitOps, Orchestration & Scheduling
    • Docker, Containers, Microservices, Serverless & Virtualization
    • Infrastructure-as-Code
    • Kubernetes & Container Orchestration
    • Linux
    • Logging, Monitoring & Observability
    • Security, Governance, Risk & Compliance
  • Cloud Providers
    • Amazon Web Services
    • Google Cloud Platform
    • Microsoft Azure

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Cloud Platforms


Cloud Experience


Development Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

Found 8 results

  1. Introduction Karpenter is a Kubernetes node lifecycle manager created by AWS, initially released in 2021 with the goal of minimizing cluster node configurations. Over the past year, it has seen tremendous growth, reaching over 4900 stars on GitHub and merged code from more than 200 contributors. It is in the process of being donated to the Cloud Native Computing Foundation (CNCF) as part of Kubernetes Autoscaling Special Interest Group. As part of this growth, there is a growing need for Karpenter’s APIs to mature, to offer more stringent stability guarantees to users that don’t want to deal with the number of breaking changes the project has made in it’s alpha state. This marks a significant milestone in the project’s evolution. With this transition, customers benefits from the increased level of maturity and API stability that the beta version offers. This also marks a commitment from us to prioritize backward compatibility, which means customers can confidently adopt new features and enhancements without the worry of disruptive changes down the line. This release, like previous releases, incorporates feedback from the open-source community. The API changes are being rolled out as part of Karpenter version 0.32.0 release. Existing deployments need to be upgraded to this version, following the migration path outlined in this post and further detailed in the Karpenter upgrade guide. Existing alpha APIs are now deprecated and remain available in this single version. Starting with the release v0.33.0, Karpenter will only support its v1beta1 APIs. Karpenter APIs follow a maturity progression of alpha → beta → stable. The graduation from alpha to beta required significant changes to the APIs, which are highlighted in this post. We don’t anticipate the graduation from beta to stable to require the same level of changes. If you’re curious about the Kubernetes APIs graduation process, then please see this post. What is changing In the journey to get to a stable v1, we’ve made significant changes to our APIs from alpha to beta to improve the ease-of-use by dropping areas of the APIs that commonly gave users problems. One of these areas was naming, where we saw confusion the use of the word provisioner (i.e., overloaded term in the realm of storage) and generally wanted to reduce the number of concepts that users had to reason about. With this release, Karpenter deprecates the Provisioner, AWSNodeTemplate and Machine APIs, while introducing NodePool, EC2NodeClass, and NodeClaim. We’ve taken an holistic view and streamlined the APIs around the single concept of Node. Walkthrough API group and kind naming The v1beta1 version introduces the following new APIs while deprecating the existing ones: karpenter.sh/Provisioner becomes karpenter.sh/NodePool karpenter.sh/Machine becomes karpenter.sh/NodeClaim karpenter.k8s.aws/AWSNodeTemplate becomes karpenter.k8s.aws/EC2NodeClass Each of these naming changes comes with schema changes that need to be considered as you update to the latest version of Karpenter. Let’s look at each change and what the new API definition looks like. v1alpha5/Provisioner → v1beta1/NodePool NodePool serves as the successor to Provisioner, exposing configuration-based parameters that impact the compatibility between Nodes and Pods during scheduling (such as requirements, taints, and labels). It also encompasses behavior-based settings for fine-tuning Karpenter’s scheduling and deprovisioning decision-making. A pool resolves to a mix of instance types and sizes, while still enforcing limits on how workloads request resources. It facilitates the grouping of provisioning and deprovisioning behavior. Importantly, a pool shouldn’t have any cloud-specific configurations to maintain a portable configuration. In Karpenter v1beta1, all non-behavioral fields are encapsulated within the NodePool template field. In the case of Karpenter, NodePool template NodeClaims, which are then orchestrated by the Karpenter controller. This mirrors the concept of Deployments, where Pods are templated and orchestrated by the deployment controller. You can read more about NodePools in the documentation. An example NodePool looks something like this: apiVersion: karpenter.sh/v1beta1 kind: NodePool ... spec: template: metadata: annotations: custom-annotation: custom-value labels: team: team-a custom-label: custom-value spec: nodeClassRef: name: default requirements: - key: karpenter.k8s.aws/instance-category operator: In values: ["c", "m", "r"] ... kubelet: systemReserved: cpu: 100m memory: 100Mi ephemeral-storage: 1Gi maxPods: 20 disruption: expireAfter: 360h consolidationPolicy: WhenUnderutilized When you look at the example specification, you’ll notice a new section called disruption. This groups the previous settings (ttlSecondsAfterEmpty, ttlSecondsUntilExipred, and consolidation.enabled) for consolidation, expiration, and empty nodes. Karpenter set defaults for the disruption configuration if it isn’t specified when applying the NodePool manifest. The default values are highlighted below and you can read more about the behavior of these fields in the documentation. Field Default spec.disruption.consolidationPolicy WhenUnderutilized spec.disruption.expireAfter 720h v1alpha1/AWSNodeTemplate → v1beta1/EC2NodeClass EC2NodeClass serves as the successor to AWSNodeTemplate, exposing cloud provider specific fields that affect launch and bootstrap process for that Node including: configuring Amazon Machine Image (AMI), security groups, and subnets you want to use as well as details about block storage, user-data, and Instance Metadata settings. The Karpenter spec.instanceProfile field has been removed from the EC2NodeClass in favor of the spec.role field. Karpenter now auto-generates the instance profile in your EC2NodeClass given the role that you specify. The spec.launchTemplateName field, which was already deprecated, for referencing unmanaged launch templates within Karpenter has been removed. If you are still using it, then you need to migrate it to the Karpenter-managed versions using EC2NodeClass. You can read more about NodeClass in the documentation. An example EC2NodeClass looks something like this: apiVersion: compute.k8s.aws/v1beta1 kind: EC2NodeClass metadata: name: default spec: amiFamily: Bottlerocket role: KarpenterNodeRole-karpenter-demo subnetSelectorTerms: - tags: karpenter.sh/discovery: karpenter-demo securityGroupSelectorTerms: - tags: karpenter.sh/discovery: karpenter-demo tags: test-tag: test-value v1alpha5/Machine→ v1beta1/NodeClaim In Karpenter v0.28.0, a new type called Machine was added. This allowed for multiple node provisioning improvements that allowed native Kubernetes controllers to join nodes to the cluster and still lets Karpenter manage and track the node. If you’re on a version of Karpenter before v0.28.0, then you won’t have this resource type. With the release v0.32.0 this has changed to NodeClaim. NodeClaims aren’t intended to be created by cluster operators, and instead they’re created and deleted by Karpenter. You shouldn’t have to make any changes for NodeClaims to work, but if you’re troubleshooting a node in a cluster this is a great place to see the lifecycle and health of a node as Karpenter manages it. Labels changes Karpenter v1beta1 introduces changes to the common labels karpenter.sh/do-not-evict and karpenter.sh/do-not-consolidate, which have been deprecated and unified under the single label karpenter.sh/do-not-disrupt. These can be applied to both Pods and Nodes and prevents Karpenter from performing node disruption and Pod eviction. More flexible selectors for AMIs, subnets, and security groups in NodeClass The current selector settings have been somewhat limiting in their capacity to identify and use different settings for nodes being provisioned. The existing behavior would apply AND logic, which made it harder to match settings in various clusters and regions. To address this, we’ve extended the selectors, which gives you the capability to specify multiple terms. These terms are now combined using an OR logic, which means they’re evaluated together until a match is identified. An example for matching an AMI with name my-name1 or my-name-2, and owner 123456789 or amazon would look like this: amiSelectorTerms: - name: my-name1 owner: 123456789 - name: my-name2 owner: 123456789 - name: my-name1 owner: amazon - name: my-name2 owner: amazon Similar settings can be made for subnetSelectorTerms and securityGroupSelectorTerms, which you can read more about in the Karpenter documentation. securityGroupSelectorTerms: - id: abc-123 name: default-security-group # Not the same as the name tag tags: key: value # Selector Terms are ORed - id: abc-123 name: custom-security-group # Not the same as the name tag tags: key: value Drift enabled by default Starting from the next release (v0.33.0), the drift feature will be enabled by default. If you don’t specify the Drift featureGate, the feature is assumed to be enabled. You can disable the drift feature by specifying --feature-gates DriftEnabled=false in the command line arguments to Karpenter. This feature gate is expected to be fully dropped when core APIs (NodePool, NodeClaim) are bumped to v1. Migration path Update Karpenter controller AWS IAM role The Karpenter controller uses an AWS Identity and Access Management (AWS IAM) role to grant the permissions to launch and operate Amazon Elastic Compute Cloud (Amazon EC2) instances in your AWS account. As part of the upgrade, create a new permission policy by adding the following: Add to the ec2:RunInstances, ec2:CreateFleet, and ec2:CreateLaunchTemplate permissions scoped down to the tag-based constraint karpenter.sh/nodepool instead of the previous tag key karpenter.sh/provisioner-name. Grant permissions to the actions iam:CreateInstanceProfile, iam:AddRoleToInstanceProfile, iam:RemoveRoleFromInstanceProfile, iam:DeleteInstanceProfile, and iam:GetInstanceProfile. All of these permissions (with the exception of the GetInstanceProfile permission) are constrained by tag-based policy that ensures that the controller only has permission to operate on instance profiles that it was responsible for creating. These are needed to support the Karpenter-managed instance profiles. One the migration is completed, and you’ve rolled out the new nodes as described in the following details, you can safely remove the previous permission policy. An example of the permission policy is available in the Karpenter GitHub repository and it is distributed as part of the project getting started AWS CloudFormation template. API migration To transition from the alpha to the new v1beta1 APIs, you should first install the new v1beta1 Custom Resource Definitions (CRDs). Subsequently, you need to generate the beta equivalent of each alpha API for both Provisioners and AWSNodeTemplates. It’s worth noting that the migration from Machine to NodeClaim is managed by Karpenter as you transition your CustomResources from Provisioners to NodePools and remains seamless for users. We’re happy to introduce karpenter-convert, which is a command line utility designed to streamline the creation of NodePool and EC2NodeClass objects. In the following, you’ll find the steps to effectively utilize this tool: Install the command line utility: go install github.com/aws/karpenter/tools/karpenter-convert/cmd/karpenter-convert@latest Migrate each provisioner into a NodePool: karpenter-convert -f provisioner.yaml > nodepool.yaml Migrate each AWSNodeTemplate into an EC2NodeClass: karpenter-convert -f nodetemplate.yaml > nodeclass.yaml For each EC2NodeClass generated by the tool you need to manually specify the AWS role. The tool leaves a placeholder $KARPENTER_NODE_ROLE, which you need to replace with your actual role name. For each Provisioner resource, you need to decide whether you want to roll nodes one-at-a-time or roll all Provisioner nodes all-at-once. A detailed step-by-step guide is provided in the following section: Periodic rolling with drift With drift enabled, for each Provisioner in your cluster, perform the following actions: Migrate your alpha CRDs to v1beta1 Add a taint to the old Provisioner such as karpenter.sh/legacy=true:NoSchedule Karpenter drift marks all machines/nodes owned by that Provisioner as drifted Karpenter drift launches replacements for the nodes in the new NodePool resource Currently, Karpenter only supports rolling of one node at a time, which means that it may take some time for Karpenter to completely roll all nodes under a single Provisioner Forced deletion For each Provisioner in your cluster, perform the following actions: Create a NodePool/EC2NodeClass in your cluster that is the v1beta1 equivalent of the v1alpha5 Provisioner/AWSNodeTemplate Delete the old Provisioner with kubectl delete provisioner <provisioner-name> --cascade=foreground Karpenter deletes each Node that is owned by the Provisioner, draining all nodes simultaneously and launches nodes for the newly pending pods as soon as the Nodes enter a draining state Manual rolling For each Provisioner in your cluster, perform the following actions: Create a NodePool/EC2NodeClass in your cluster that is the v1beta1 equivalent of the v1alpha5 Provisioner/AWSNodeTemplate Add a taint to the old Provisioner such as karpenter.sh/legacy=true:NoSchedule Delete each node one-at-time owned by the Provisioner by running kubectl delete node <node-name> Conclusion In this post, we showed you the modifications introduced by the new APIs and provided insight into the reasoning behind these changes, which have been shaped by feedback from the community. We’re thrilled to witness the growing maturity of the Karpenter project. We anticipate that the majority of these changes will eventually move to the stable v1 API, which enables a broader user base to take full advantage of Karpenter’s capabilities in workload-native node provisioning. There are some other deprecations and changes that we didn’t cover in this post. Please head to the Karpenter upgrade guide for a comprehensive migration guideline. Before you upgrade Karpenter to v0.32.0, it is recommended to read the full release notes. If you have any questions, then please feel free to reach out in the Kubernetes slack #karpenter channel or on GitHub where we welcome feedback that helps us prioritize and develop new features. View the full article
  2. Introduction Cluster autoscaler, has been the de facto industry standard autoscaling mechanism on kubernetes since the very early version of the platform. However, with the evolving complexity and number of containerized workloads, our customers running on Amazon Elastic Kubernetes Service (Amazon EKS) started to ask for a more flexible way to allocate compute resources to pods and flexibility in instance size and heterogeneity. We addressed those needs with karpenter, a product that automatically launches just the right compute resources to handle your cluster’s applications. Karpenter is designed to take full advantage of Amazon Elastic Compute Cloud (Amazon EC2). Although serving the same purpose, cluster autoscaler and karpenter take a very different approach to autoscaling. In this post, we won’t focus on the differences of the two solutions, but instead we’ll analyze how those can be used to fulfill a specific requirement — scaling an Amazon EKS cluster to zero nodes. Scaling an Amazon EKS cluster to zero nodes can be useful for a variety of reasons. For example, you might want to scale your cluster down to zero nodes when there is no traffic, or you might want to scale your cluster down to zero nodes when you are performing maintenance. This not only reduces costs, but increases the sustainability of resource utilization. Solution overview Cost considerations of scaling down clusters The cost optimization pillar of the AWS Well-Architected Framework includes a specific section that focuses on the financial advantages of implementing a just-in-time supply strategy. Autoscaling is often the preferred approach for matching supply with demand. Figure 1: Adjusting capacity as needed. Autoscaling in Amazon EKS When it comes to Amazon EKS, we need to think of control plane autoscaling and data plane autoscaling as two separate concerns. When Amazon EKS launched in 2018, the goal was to reduce users’ operational overhead by providing a managed control plane for kubernetes. Initially, this included automated upgrades, patches, and backups, but with fixed capacity. An Amazon EC2-backed data plane (with the exception of AWS Fargate) is not fully managed by AWS. Managed node groups reduce the operational burden by automating the provisioning and lifecycle management of nodes. However, upgrades, patches, backups, and autoscaling are the responsibility of the user. In this post, we’ll cover data plane autoscaling, and more specifically, since there are different ways to run Amazon EKS nodes — using Amazon EC2 instances, AWS Fargate, or using AWS Outposts. In this post, we’ll focus on Amazon EKS nodes running on Amazon EC2. Before we go any further, let’s take a closer look at how kubernetes traditionally handles autoscaling for pods and nodes. Autoscaling pods In kubernetes, pods autoscaling is tackled via the Horizontal Pod Autoscaler (HPA), which automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand. Horizontal scaling means that the response to increased load is to deploy more pods. This is different from vertical scaling, which for kubernetes, means assigning more resources (e.g., memory or central process units [CPUs]) to the pods that are already running for the workload. Figure 2: Autoscaling pods with the Horizontal Pod Autoscaler. When the load decreases and the number of pods is above the configured minimum, the Horizontal Pod Autoscaler instructs the workload resource (i.e., the deployment, StatefulSet, or other similar resource) to scale back in. However, Horizontal Pod Autoscaler does not natively support scaling down to 0. There are a few operators that allow you to overcome this limitation by intercepting the requests coming to your pods, or by checking some specific metrics, such as Knative or Keda. However, these are sophisticated mechanisms for achieving serverless behaviour and are beyond the scope of this post on schedule-based scaling to 0. Autoscaling nodes In kubernetes, nodes autoscaling can be addressed using the cluster autoscaler, which is a tool that automatically adjusts the size of the kubernetes cluster when one of the following conditions is true: there are pods that failed to run in the cluster due to insufficient resources. there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes. Figure 3: Autoscaling nodes with the Cluster Autoscaler. Cluster autoscaler decreases the size of the cluster when some nodes are consistently unneeded for a set amount of time. A node is unnecessary when it has low utilization and all of its important pods can be moved elsewhere. When it comes to Amazon EC2-based nodegroups (assuming their minimum size is set to 0) the cluster autoscaler scales the nodegroup to 0 if there are no pods preventing the scale in operation. Pricing model and cost considerations For each Amazon EKS cluster, you pay a basic hourly rate to cover for the managed control plane as well as the cost of running the Amazon EC2-backed data plane and any associated volumes. Hourly Amazon EC2 costs vary depending on the size of the data plane and the underlying instance types. While we would continue to pay the hourly rate for the control plane for the non-production clusters that are used for testing or quality assurance purposes, we may not need the data plane to be available 24 hours a day including weekends. By establishing a schedule-based approach to scale the nodegroups to 0 when unneeded, we can significantly optimize the overall Amazon EC2 compute costs. Cost savings can go beyond bare Amazon EC2 costs. For example, if you use Amazon CloudWatch container insights for monitoring, then you would not be charged when nodes are down given that the costs associated with metrics ingestion are prorated by the hour. In this post, we’ll show you how you can achieve schedule-based scale to 0 for your data plane with Horizontal Pod Autoscaler (HPA) and cluster autoscaler as well as with karpenter. Current mechanisms to scale to zero using HPA and cluster autoscaler We have seen how kubernetes traditionally handles autoscaling for both pods and nodes. We’ve also seen how the current implementations of Horizontal Pod Autoscaler can’t handle schedule-based scale to 0 scenarios. However, the native capabilities can be supplemented with dedicated Kubernetes CronJobs or community-driven open source solutions like cron-hpa or kube downscaler, which can scale pods to 0 on specific schedules. Additionally, we need to make sure that not only we can scale in to 0 but that we can also scale out from 0. Since kubernetes version 1.24, a new feature has been integrated in cluster autoscaler, which makes this easier. Quoting the official announcement: For Kubernetes 1.24, we have contributed a feature to the upstream Cluster Autoscaler project that simplifies scaling the Amazon EKS managed node group (MNG) to and from zero nodes. Before, you had to tag the underlying EC2 Autoscaling Group (ASG) for the Cluster Autoscaler to recognize the resources, labels, and taints of an MNG that was scaled to zero nodes. Starting with kubernetes version 1.24, when there are no running nodes in the MNG and the cluster autoscaler calls the Amazon EKS DescribeNodegroup API to get the information it needs about MNG resources, labels, and taints. When the value of a cluster autoscaler tag on the ASG powering an Amazon EKS MNG conflicts with the value of the MNG itself, the cluster autoscaler prefers the ASG tag so that customers can override values as necessary. Thanks to this new feature, the cluster autoscaler determines which nodegroup needs to be scaled out from 0 based on the definition of the unschedulable pods, but in order for it to be able to do so, it must be up and running. In other words: we cannot scale all of our nodegroups to 0 as we do need to guarantee a minimal stack of core components to be constantly up and running. Such a stack would include, at the very least: the cluster autoscaler, the Core DNS, and the open-source tool of our choice to cover schedule-based scaling of pods. Ideally, we might also need to accommodate Cluster Proportional Autoscaler (CPA) to address Core DNS scalability. To be cost efficient, we might decide to create a dedicated nodegroup for the core components, which would be backed by cheap instance types, and separate nodegroups for applicative workloads. Putting it all together: Kube downscaler or cron-hpa apply a schedule-based scaling to or from 0 for applicative workloads. Cluster autoscaler notices if nodes can be scaled in (including to 0) as underutilized or that some pods cannot be scheduled due to insufficient resources and nodes need to scale out (including from 0). Cluster autoscaler interacts with the AWS ASG API (Application Programming Interface) to terminate or provision new nodes. The nodegroup is scaled to or from 0 as expected. Figure 4: Schedule-based scale to 0 using an EC2 backed technical nodegroup for core components. Eventually, this pattern can be further optimized by moving the minimal stack of core components to AWS Fargate. This means that not a single Amazon EC2 instance is running when the data plane is unneeded. The cost implications of hosting the core components in AWS Fagate must be carefully assessed. Keeping the lower-cost Amazon EC2 instance types may result in a less elegant but more cost-effective solution. Figure 5: Schedule-based scale to 0 using a Fargate profile for core components. How it is done with karpenter With karpenter, we have the concept of provisioner. Provisioners set constraints on the nodes that can be created by parpenter and the pods that can run on those nodes. With the current version of karpenter (0.28.x), there are three ways to scale down the number of nodes to zero using provisioners: Delete all provisioners. Deleting provisioners causes all nodes to be deleted. This option is the simplest to implement, but it may not be feasible in all situations. For example, if you have multiple tenants sharing the same cluster, you may not want to delete all provisioners, as this would prevent any tenants from running workloads. Scale all workloads to zero. Karpenter then deletes the unused nodes. This option is more flexible than deleting all provisioners, but it may not be ideal if your workloads are managed by different team and might be difficoult to implement in a GitOps setup. Add a zero CPU limit to provisioners and then delete all nodes. This option is the most flexible, as it allows you to keep your workloads running while still scaling down the number of nodes to zero. To do this, you need to update the spec.limits.cpu field of your provisioners. The first two options previously described may be difficult to implement in multi-tenant configurations or using GitOps frameworks. Therefore, this post focuses on the third option. Walkthrough Technical considerations Programmatically scaling provisioner limits to zero can be done in a number of ways. One common pattern is to use kubernetes CronJobs. For example, the following Cronjob scales the provisioner limits to zero every work day at 10.30 PM: --- apiVersion: batch/v1 kind: CronJob metadata: name: scale-down-karpenter spec: schedule: "30 22 * * 1-5" jobTemplate: spec: [...] command: - /bin/sh - -c - | kubectl patch provisioner test-provisioner --type merge --patch '{"spec": {"limits": {"resources": {"cpu": "0"}}}}' && echo "test-provisioner patchd at $(date)"; [...] This job runs every night at 10.30 PM and scales the provisioner’s limits to zero, which effectively disables the creation of new nodes until it is manually scaled back up. CronJobs can be used with AWS Lambda to terminate running nodes, or to implement more complex logic such as scaling other infrastructure components, handling errors and notifications, or any event-driven pattern that can be connected to an application or workload. AWS Step Functions can add an additional layer of orchestration to this, allowing you to interact with your cluster using the kubernetes API and run jobs as part of your application’s workflow. More information on how to use the kubernetes API integrations with AWS Step Functions can be found here. This is a simplified example of an AWS Lambda function that can be used to terminate the remaining karpenter nodes: def lambda_handler(event, context): [...] filters = [ {'Name': 'instance-state-name','Values': ['running']}, {'Name': f'tag:{"karpenter.sh/provisioner-name"}', 'Values':"example123"}, {"Name": "tag:aws:eks:cluster-name", "Values": "example123"} ] try: instances = ec2.instances.filter(Filters=filters) RunningInstances = [instance.id for instance in instances] except botocore.exceptions.ClientError as error: logging.error("Some error message here") raise error if len(RunningInstances) > 0: for instances in RunningInstances: logging.info('Found Karpenter node: {}'.format(instances)) try: ec2.instances.filter(InstanceIds=RunningInstances).terminate() except botocore.exceptions.ClientError as error: logging.error("Some error message here") raise error [...] Note: these steps can be difficult to orchestrate in a GitOps setup. The general advise is to create specific conditions for provisioner limits. This is (purely) as example, how this can be done with ArgoCD: apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: karpenter namespace: argocd spec: ignoreDifferences: - group: karpenter.sh kind: Provisioner jsonPointers: - /spec/limits/resources/cpu How to move core components to AWS Fargate for further optimization Karpenter and cluster autoscaler run a controller inside a pod running on the cluster. This controller needs to be up and running to orchestrate scale operations up or down. This means that at least one node should be running on the cluster to host those controllers. However, if you are interested in scale-to-zero scenarios, there is an option that should be taken into consideration: AWS Fargate. AWS Fargate is a serverless compute engine that allows you to run containers without having to manage any underlying infrastructure. This means that you can scale your application up and down as needed, without having to worry about running out of resources. AWS Fargate profiles that run karpenter can be configured via AWS Command Line Interface (AWS CLI), AWS Management Console, CDK (Cloud Development Kit), Terraform, AWS CloudFormation, and eksctl. The following example shows how to configure those profiles with ekstcl: apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig metadata: name: <cluster-name> region: <aws-region> fargateProfiles: [...] - name: karpenter podExecutionRoleARN: arn:aws:iam::12345678910:role/FargatePodExecutionRole selectors: - labels: app.kubernetes.io/name: karpenter namespace: karpenter subnets: - subnet-12345 - subnet-67890 - name: karpenter-scaledown podExecutionRoleARN: arn:aws:iam::12345678910:role/FargatePodExecutionRole selectors: - labels: job-name: scale-down-karpenter* namespace: karpenter subnets: - subnet-12345 - subnet-67890 [...] Note: By default, CoreDNS is configured to run on Amazon EC2 infrastructure on Amazon EKS clusters. If you want to only run your pods on AWS Fargate in your cluster, then refer to the Getting started with AWS Fargate using Amazon EKS guide. Conclusions In this post, we showed you how to scale your Amazon EKS clusters to save money and reduce your environmental impact. By using cluster autoscaler and karpenter, you can easily and effectively scale your clusters up and down, as needed. These tools can help you to scale your Amazon EKS clusters to zero nodes and save on your resource utilization and carbon footprint. If you want to get started with karpenter, then you can find the official documentation here. The documentation includes instructions on Kubernetes installation and the configuration of provisioners and all the other components required to orchestrate autoscaling. This guide focuses on Amazon EKS, but the same concepts can apply on self hosted kubernetes solutions. View the full article
  3. Karpenter aims to enhance both the effectiveness and affordability of managing workloads within a Kubernetes cluster. The core mechanics of Karpenter involve: Monitoring unscheduled pods identified by the Kubernetes scheduler. Scrutinizing the scheduling constraints, including resource requests, node selectors, affinities, tolerations, and topology spread constraints, as stipulated by the pods. Provisioning nodes that precisely align with the pods’ requirements. Streamlining cluster resource usage by removing nodes once their services are no longer required. In this article, I talk about how to set up and use Karpenter for managing worker nodes in EKS... View the full article
  4. Karpenter is an open-source cluster autoscaler that provisions right-sized nodes in response to unschedulable pods based on aggregated CPU, memory, volume requests, and other Kubernetes scheduling constraints (e.g., affinities and pod topology spread constraints), which simplifies infrastructure management. In this post, we’ll describe the mechanism for patching Kubernetes worker nodes provisioned with Karpenter through a gated Karpenter feature called Drift. If you have many worker nodes across multiple Amazon EKS clusters, then this mechanism can help you continuously patch at scale… View the full article
  5. H2O.ai is a visionary leader in democratizing artificial intelligence (AI) by rapidly provisioning AI platforms that help businesses make better decisions. Our company’s SaaS platform, built on AWS, H2O AI Managed Cloud, enables businesses to build productive models and gain insights from their data quickly and easily. H2O.ai’s platform uses data and technology to improve: Speed: Our platform helps businesses to quickly develop and deploy AI/ML models, which leads to faster time to market and improved decision-making. Accuracy: Our platform uses a variety of techniques to improve the accuracy of AI/ML models, including automatic feature engineering and model selection. Scalability: Our platform can be scaled to handle large datasets and complex problems. Explainability: Our platform provides insights into how AI/ML models make decisions, which can help businesses to trust and adopt these models. This post demonstrates how we used Karpenter, an AWS open-sourced just-in-time Kubernetes autoscaler, and Bottlerocket, a secure, lightweight, purpose-built Linux-based operating system to run containers in the Amazon Elastic Kubernetes Service (Amazon EKS) clusters. This combined functionality, along with prefetching container images, helped us improve the compute provisioning and configuration time for our ML workloads by 100-fold... View the full article
  6. CoStar is well known as a market leader for Commercial Real Estate data, but they also run major home, rental, and apartments websites —including apartments.com—that many have seen advertised by Jeff Goldblum. CoStar’s traditional Commercial Real Estate customers are highly informed users that use large and complex data to make critical business decisions. Successfully helping customers analyze and decide which of the 6 million properties with 130 billion sq. ft. of space to rent, has made CoStar a leader in data and analytics technology. When CoStar began building the next generation of their Apartments and Homes websites, it became clear the user profile and customer demands had important differences from their long running Commercial Real Estate customers. CoStar needed to deliver the same decision-making value to their new customer base, but for magnitudes more customers and data. This initiated CoStar’s migration from their legacy data centers into AWS for speed and elasticity needed to deliver the same value for millions of users accessing hundreds of millions of properties... View the full article
  7. Cloud-native technologies are becoming increasingly ubiquitous, and Kubernetes is at the forefront of this movement. Today, Kubernetes is seeing widespread adoption across organizations in a variety of different industries. When implemented properly, Kubernetes can help these organizations achieve higher availability, scalability, and resiliency for their workloads. Combining Kubernetes with the attributes of cloud computing—such as unparalleled scalability and elasticity—can help organizations enhance their containerized applications’ resiliency and availability. As detailed in this introductory post, Karpenter‘s objective is to make sure that your cluster’s workloads have the compute they need, no more and no less, right when they need it. In its most recent updates, Karpenter added support for more advanced scheduling constraints, such as pod affinity and anti-affinity, topology spread, node affinity, node selection, and resource requests. This post will specifically delve into podAffinity, podAntiAffinity, and volume topology awareness and elaborate on the use cases that they’re best suited for... View the full article
  8. Amazon Elastic Kubernetes Service (EKS) is announcing v0.9.0 of the Karpenter open-source cluster autoscaling project. Karpenter is a flexible, high-performance Kubernetes cluster autoscaler that helps improve application availability and resource utilization. Karpenter v0.9.0 adds supports for Kubernetes podAffinity and podAntiAffinity scheduling constraints, which increases its compatibility with popular third-party Helm charts and expands support for high-availability use cases. View the full article
  • Forum Statistics

    43.3k
    Total Topics
    42.7k
    Total Posts
×
×
  • Create New...