Posted Monday at 10:13 PM4 days This blog post was co-authored by Alex Kestner, Sr Product Manager – EKS; Todd Neal, Sr. Software Engineer – EKS; Neelendra Bhandari, Sr Software Dev Manager – EKS; and Sai Vennam, Principal Specialist Solutions Architect. At re:Invent 2024, we launched Amazon Elastic Kubernetes Service (Amazon EKS) Auto Mode, a new feature that provides a production-ready, Kubernetes conformant cluster that is ready to host your workloads out of the box. In this post, we dive into what this means for your Kubernetes workloads and look under the hood of EKS Auto Mode clusters. Introduction to EKS Auto Mode EKS Auto Mode is a streamlined way to run applications on Kubernetes. It automatically manages the setup, scaling, and maintenance of the Kubernetes control plane and worker nodes, so you don’t have to worry about the underlying infrastructure. You focus on deploying your applications, and EKS Auto Mode handles the rest, making it ideal for users who want to use Kubernetes without managing its complexity. At re:Invent in 2017, we launched Amazon EKS, which streamlined the operation of Kubernetes for our users. At launch, Amazon EKS provided a managed Kubernetes control plane integrated with existing services such as AWS Identity and Access Management (IAM). We are responsible for the health and patching of that control plane and our users currently run tens of millions of EKS clusters every year. However, users were still responsible for operating their Kubernetes data plane, the Amazon Elastic Compute Cloud (Amazon EC2) nodes on which workloads ran. We’ve introduced features over the years such as Managed Node Groups and Karpenter, which reduced the burden of operating the data plane. However, users were still responsible for choosing the right OS, scaling the underlying nodes, and managing core add-ons and components such as the CNI and kube-proxy. Figure 1: Shared Responsibility Model with Amazon EKS (without Auto Mode) EKS Auto Mode is an evolution over the operating model we introduced in 2017 where we now take over more responsibility for the data plane portion of a Kubernetes cluster and provide managed compute, networking, and storage capabilities. EKS Auto Mode allows users to create a cluster and immediately start deploying workloads to a production ready environment. We are responsible for the configuration, patching, and health of the EC2 instances allowing users to focus only on the VPC and cluster configuration, and the application containers that they are running. Figure 2: Shared Responsibility Model with EKS Auto Mode The data plane in EKS Auto Mode There are a number of critical components that make up the data plane for EKS Auto Mode clusters: EC2 managed instances are standard EC2 instances where operational control has been delegated to an AWS service. Bottlerocket is an open source operating system purpose-built by AWS for running containers. Core capabilities and add-ons are built into the EKS Auto Mode nodes, eliminating the need for users to manage and maintain these components. Worker nodes management (powered by Karpenter) handles worker node health automatically, with the ability to delete and replace nodes with the ideal instance types for cost efficiency. EC2 managed instances At its lowest level, EKS Auto Mode uses a new feature announced at re:Invent 2024: EC2 managed instances. Managed instances are standard EC2 instances, except that you have delegated operational control over the instance to an AWS service such as Amazon EKS. Essentially you trade operational overhead and some control to gain improved security. For example, you no longer manually detach an Amazon Elastic Block Store (Amazon EBS) volume from an EKS Auto Mode node, because the managed storage capability is responsible for attaching and detaching EBS volumes needed by workloads. Similarly, you don’t directly SSH into nodes, but always retain the ability to interact with and troubleshoot instances through the Amazon EKS service. Cleaning up EKS clusters is more direct, because deleting the EKS cluster forces termination of the EC2 managed instances associated with the cluster. EC2 managed instances enable EKS Auto Mode to seamlessly manage compute capacity for Kubernetes workloads. Therefore, you can focus on deploying and running applications instead of managing nodes. Your workloads are held to the same standard as if AWS was running workloads on your behalf (for example, AWS Lambda or AWS Fargate). When workloads exceed the capacity of existing nodes, EKS Auto Mode uses Karpenter (covered in detail in the Worker nodes management section below) to dynamically provision more EC2 managed instances to make sure of high availability and performance, thus eliminating the need for manual scaling adjustments. Beyond streamlining infrastructure management, EC2 managed instances also help optimize costs. They can use existing AWS cost-saving mechanisms, such as Reserved Instances and Savings Plans, thus providing flexibility while maintaining predictable expenses. Using EC2 managed instances through EKS Auto Mode allows users to enjoy a fully managed Kubernetes experience where AWS handles the infrastructure, freeing users to concentrate on application development and scaling. Bottlerocket as the choice of operating system EC2 managed instances, like all other EC2 Instances, need an operating system. EKS Auto Mode uses Bottlerocket: an open source operating system purpose-built by AWS for running containers. Bottlerocket is an ideal base operating system for EKS Auto Mode as it is designed to run Kubernetes workloads efficiently and securely. Bottlerocket only includes the essential software needed for running containers. The open source project maintains around 100 package definitions, as compared to 50,000 in a large general-purpose operating system. This makes sure that unnecessary features and dependencies are disabled at build time. Moreover, this reduces the surface area for potential CVEs while also providing more resources for workloads by eliminating services that aren’t needed for container use cases. Bottlerocket enforces cryptographic integrity checks for the root filesystem and mandatory access controls, such as SELinux, to reduce the attack surface in the event of container escape. The EKS Auto Mode Amazon Machine Images (AMIs) are custom Bottlerocket variants that use Bottlerocket’s new Out of Tree Build (OOTB) system, which is a streamlined mechanism for creating custom Bottlerocket variants. These variants define a dependency on Bottlerocket’s core in a way that allows the custom variant to evolve independently and consume Bottlerocket core updates safely. The kernel kit and core kit form the core of Bottlerocket, and EKS Auto Mode AMIs consume both of these kits from the Bottlerocket project to benefit from its focus on security and carefully curated dependencies. As well as providing new EKS Auto Mode specific node capabilities, the Auto Mode AMI also makes some changes to the base Bottlerocket configuration. For example, standard Bottlerocket doesn’t support interactive log in through SSH or the Console. Instead, it allows users to bring in a special type of container called a host container to provide access. EKS Auto Mode disables Bottlerocket host containers altogether and instead provides Kubernetes native mechanisms for retrieving node logs and interacting directly with the node, such as retrieving node troubleshooting information when the node has no network connectivity. EKS Auto Mode also uses new Bottlerocket features such as bootstrap commands to configure local instance storage on the node for the instance types that support it. This makes sure that the fast ephemeral local storage included with those instance types is automatically used for container images, pod ephemeral data, and logs. Core node capabilities A Kubernetes node typically needs a number of DaemonSets that create one pod per node to provide essential node level functionality. Our users frequently expressed that managing, patching, and making sure of compatibility among these components was challenging and time-consuming. As part of EKS Auto Mode, we eliminated that undifferentiated heavy lifting for the most commonly used components. We began by identifying which core node capabilities were used in the vast majority of EKS Clusters, and then worked to build those capabilities directly into EKS Auto Mode nodes: Networking: Node local networking configuration, DNS, and network policy enforcement Storage: Operating system level configuration of Persistent Volumes backed by Amazon EBS and local instance storage for ephemeral data Identity: Provides IAM identities to configured pods Specialized hardware support AWS Neuron: Drivers and device plugin to make Inferentia and Trainium accelerators available for pods NVIDIA: Drivers and device plugin to make NVIDIA GPUs available for pods Elastic Fabric Adapter (EFA): Drivers and device plugin to make EFA devices available to pods Health: Node health monitoring, enabling reporting and automatic repair of certain failure modes Therefore, with an EKS Auto Mode cluster you can create a pod with correctly configured networking, such as Network Policy enforcement, which uses Pod Identity to make requests to AWS services, and has an EBS backed Persistent Volume where it stores data. If your node pool supports accelerated instance types, then that same pod can also use Neuron accelerators or NVIDIA GPUs by requesting the resource through the pod’s resource requests. To further reduce the burden on users, built-in health monitoring periodically checks for a set of issues and failure modes that we have identified over time by operating Amazon EKS, and reports those through Kubernetes events and conditions. For error cases such as an unresponsive kubelet or exhausting all process IDs, EKS Auto Mode can automatically repair the node by replacing it to minimize disruption to applications. Worker nodes management The EKS Auto Mode compute capability, powered by Karpenter, combines EC2 managed instances and the Bottlerocket-based EKS Auto Mode AMIs to create EKS Auto Mode nodes. It automatically launches and terminates those nodes as needed to provide the necessary compute capacity for running your workloads. These nodes are continuously optimized for cost-efficiency based on your configured node pools and the workload requirements within your cluster. The process begins by identifying the most cost-effective instance types that meet your workload requirements (such as CPU, memory, or specialized hardware needs) and their Kubernetes scheduling constraints. You can either precisely control instance type selection through workload or node pool requirements, or maintain flexibility by allowing a broader range of instance types, potentially reducing costs. When suitable instances are identified, EKS Auto Mode launches the necessary EC2 managed instances using compatible Auto Mode AMIs for those specific instance types. Over time your workload requirements may change as they scale up or down to meet demand, or when workloads are added to or removed from the cluster. EKS Auto Mode continuously evaluates your entire cluster as part of its consolidation process to determine if it can run the workloads more cost-effectively. At a high level, it uses two methods to achieve this: Node deletion: A node is eligible for deletion when all of its pods can run on the available capacity of other nodes in the cluster. Node replacement: A node can be replaced when all of its pods can be redistributed across both the available capacity of existing nodes and a single lower-cost replacement node. This seamless integration of rapid node provisioning and continuous cost optimization allows you to focus on the workloads in your cluster while EKS Auto Mode handles node management tasks. Lifecycle and maintenance Prior to EKS Auto Mode, users were responsible for validating the compatibility of node level components with their EKS cluster control plane, deploying those components, and ensuring that they remained patched and up-to-date. Features such as Cluster Insights reduced some of that burden by showing incompatibilities, but the deployment and patching remained a user responsibility. EKS Auto Mode allows AWS to take on that responsibility by ensuring that a tested, up-to-date AMI, which includes all core node capabilities, is continually made available for all EKS Auto Mode nodes. Our Auto Mode AMI build and release process is driven by a continuous deployment pipeline that is responsible for: CVE scanning AMI building AMI validation Kubernetes conformance testing Component functional testing (for example, validate that pods can obtain IAM credentials through EKS Pod Identity) Security testing AMI deployment The pipeline adheres to our standard best practices for safe, hands-off deployments. The process begins by deploying the newly tested AMI to a small subset of EKS Auto Mode clusters in a single Region, with a bake time period to detect potential issues. As confidence in the AMI stability grows, it is gradually rolled out to more clusters in larger waves and across more AWS Regions, while reducing the bake time between deployments. EKS Auto Mode allows users to control and trigger all control plane upgrades. For data plane updates, users can use Pod Disruption Budgets and Node Disruption Budgets to manage the update process. These tools offer granular control at both the pod and node level: Pod Disruption Budgets define the maximum number of disruptions allowed for specific workloads during updates. Node Disruption Budgets enable users to specify maintenance windows and control the number of nodes that can be updated simultaneously. When AWS releases new EKS Auto Mode AMIs with security patches, EKS Auto Mode can automatically upgrade the worker nodes while respecting Kubernetes scheduling constraints and configured disruption budgets. The Amazon EKS best practices documentation provides detailed guidance on implementing these controls effectively, such as specific recommendations for maintaining application reliability during updates. Conclusion Amazon EKS Auto Mode represents a significant evolution in how users can run Kubernetes on AWS. Combining Amazon EC2 managed instances for secure compute management, Bottlerocket for a container-optimized operating system, and built-in node capabilities for essential functionality, allows EKS Auto Mode to enable users to shift their focus from infrastructure management to application development. Instead of spending time configuring node components, managing security patches, and maintaining operational tools, teams can concentrate on deploying and scaling the applications that matter to their business. Ready to get started with EKS Auto Mode? You can deploy a new EKS Auto Mode cluster or enable EKS Auto Mode on an existing cluster while using eksctl, the AWS CLI, the AWS Management Console, EKS APIs, or your preferred infrastructure-as-code tools. Try our hands-on workshop that guides you through deploying workloads and exploring Auto Mode’s capabilities. You can run this in your own AWS account or register for an AWS-hosted event.View the full article
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.