Posted March 5Mar 5 This post was jointly authored by Praseeda Sathaye (Principal Solutions Architect, Containers & OSS), AJ Davis (AWS Enterprise Support) and Arvind Viswanathan (Principal Solutions Architect). Introduction In the rapidly evolving world of containerized applications, maintaining resilience and observability across Kubernetes environments has become a critical challenge. As organizations increasingly adopt Amazon Elastic Kubernetes Service (Amazon EKS) to manage their containerized workloads, the need for cluster version lifecycle management and discovery mechanisms becomes crucial. As Amazon EKS environments grow more complex and span multiple AWS Regions and accounts, users often struggle to track cluster versions, support lifecycles, and overall deployment status. Proactive monitoring of EKS cluster lifecycles and end of support is crucial to making sure of the security, stability, and compliance of Kubernetes deployments. Furthermore, gaining visibility into EKS cluster deployments across an entire AWS Organization is essential for effective resource management, strategic planning, and maintaining an accurate inventory. In this post, to address these pain points, we share two robust solutions that provide observability of EKS clusters: End of support notifications Discovery and reporting The first solution uses AWS Health, Amazon EventBridge, and Amazon Simple Notification Service (Amazon SNS)/Amazon Simple Queue Service (Amazon SQS) to monitor Amazon EKS-specific events, particularly for clusters approaching end of support (standard and extended). Delivering early notifications when an EKS cluster is nearing the end of its support window allows this solution to empower you to proactively plan and update your cluster’s Kubernetes version. Complementing this, the second solution is an automated discovery and reporting mechanism that identifies and aggregates detailed information about EKS clusters across all AWS Regions and accounts within your Organization. This comprehensive visibility into cluster versions, associated tags, and other key details facilitates compliance checks, accurate resource inventory management, and strategic upgrade planning. Together, these two solutions provide a robust framework for effective EKS cluster lifecycle management, enabling organizations to stay ahead of potential issues, optimize resource usage, and make informed decisions that align with their long-term strategic goals. Prerequisites You need the following to complete the walkthrough: An AWS account with Organizations enabled Business, Enterprise On-Ramp, or Enterprise Support plan from AWS Support to use the AWS Health API Basic knowledge of Amazon EKS, AWS Health, EventBridge, AWS Lambda, AWS Identity and Access Management (IAM), Amazon S3, Amazon SNS, Amazon SQS, and AWS Cloud Development Kit (AWS CDK) Ability to delegate permissions from management to a tooling account that is used to centralize notifications and perform EKS cluster discovery across the entire Organization Knowledge of Python Initial setup The following steps guide you through the initial setup. Enable AWS Health Organizational View within the management account Enable Organizational View in AWS Health to obtain a centralized, aggregated view of AWS Health events across your entire Organization. You can verify that this is enabled through the console or by running the following command using the AWS Command Line Interface (AWS CLI): aws health describe-health-service-status-for-organization. You should see the following: {"healthServiceAccessStatusForOrganization": "ENABLED" } A Business, Enterprise On-Ramp, or Enterprise Support plan from AWS Support is necessary to use the AWS Health API and to complete this step. Delegate administration from management account to a central tooling account Set up an AWS account within the Organization to be the tooling account for this solution. This account is used to centralize notifications and discovery. From the management account, delegate AWS CloudFormation StackSets administration by following the steps described in this post: CloudFormation StackSets delegated administration. The same result can also be achieved by running the following command from the management account. Replace 012345678901 with the AWS account ID of your tooling account. aws organizations register-delegated-administrator \ --serviceprincipal=member.org.stacksets.cloudformation.amazonaws.com \ --account-id="012345678901" This is the only time we need to access the management account. The remaining steps are completed from within the tooling account. Bootstrap AWS CDK Choose a primary Region where all the reporting and events are consolidated within the central tooling account. Set the AWS_DEFAULT_REGION variable to this primary Region. For the discovery and reporting solution, you must bootstrap AWS CDK in this primary Region across the entire Organization. Moreover, AWS CDK must also be bootstrapped in all AWS Regions where EKS clusters are deployed to receive end of support notifications. To streamline this walkthrough, we demonstrate the deployment of the resources to only the primary Region you have chosen. The steps to bootstrap AWS CDK across multiple AWS Regions and accounts are available in this post: Bootstrapping multiple AWS accounts for AWS CDK using CloudFormation StackSets. Download the AWS CDK stacks We provide AWS CDK stacks for you to quickly deploy the solution in your environment. Download the code from our GitHub repository and set up the environment by running the following commands within the cdk directory: python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt Walkthrough The following steps walk you through these solutions. Solution 1: EKS cluster end of support notifications Our first solution addresses the critical need for timely awareness of EKS cluster lifecycle events, particularly the approach of end-of-standard-support dates. Using AWS Health, EventBridge, and Amazon SNS (and optionally Amazon SQS) allowed us to create a centralized system that: Monitors AWS Health events across multiple AWS Regions and accounts Focuses on Amazon EKS-specific events, specifically the AWS_EKS_PLANNED_LIFECYCLE_EVENT Provides early notifications when an EKS cluster is 180 days away from reaching the end of its standard support and extended support periods This centralized approach makes sure that Amazon EKS users receive sufficient time to plan and execute version upgrades, maintaining the security and stability of their Kubernetes environments, as shown in the following figure. Figure 1: Solution overview – end of support notifications Step 1: Deploy the eks-health-events AWS CDK stack Deploy the eks-health-events AWS CDK stack to the central tooling account using the following command: cdk deploy eks-health-events --app "python3 tooling_account.py" —require-approval never This deploys the AWS CDK app in tooling_account.py, which provisions the following resources in the central tooling account: Event bus SNS topic and SQS queue to monitor events EventBridge rule to forward planned lifecycle events for Amazon EKS to Amazon SNS EventBridge rule to forward monitor planned lifecycle events for Amazon EKS to Amazon SQS Resource policies for the event rules to publish to Amazon SNS and Amazon SQS Step 2: Deploy the eks-health-events-stack-set AWS CDK stack Deploy the eks-health-events-stack-set AWS CDK stack. cdk deploy eks-health-events-stack-set --app "python stack_sets.py" —require-approval never This uses CloudFormation StackSets to deploy the following resources to the chosen primary Region across all the accounts in the Organization besides the Management account: Local event bus EventBridge rule to forward planned lifecycle events for Amazon EKS to the central event bus that was provisioned in Step 2 Resource policies for the event rules to publish to the central event bus Step 3: Configure SNS notifications Browse to the Amazon SNS service named eks-health-events-EKSHealthEvents-<primary region> and create a subscription to the newly created topic (for example a group email address). Step 4: Validate the solution You can inspect and validate that the EventBridge rules, SQS queue, and SNS topic were created by the CloudFormation stacks named eks-health-events and eks-health-events-stack-set. From this point on, as your EKS clusters are 180 days away from reaching the end of support (standard and extended), the EventBridge rules apply and Amazon SNS and/or Amazon SQS is triggered, as shown in the following figure. Figure 2: Validate EventBridge deployment Figure 3: Validate SQS deployment Figure 4: Validate SNS deployment Figure 5: Sample end of support notification Solution 2: EKS cluster discovery and reporting Complementing the EKS cluster end of support notifications solution, our second solution offers a comprehensive view of EKS clusters across an entire Organization. This solution: Identifies EKS clusters in all AWS Regions and accounts within an Organization Collects detailed information about each cluster, such as account details, region, cluster name, version, and associated tags Aggregates data on cluster versions, providing insights into version distribution Generates both detailed and summary reports, stored centrally for direct access Providing this organization-wide visibility allows the solution to enable teams to maintain an accurate inventory of Amazon EKS resources, facilitate compliance checks, and support strategic upgrade planning, as shown in the following figure. Figure 6: Solution overview – discovery and reporting Step 1: Deploy the eks-discovery AWS CDK stack Deploy the eks-discovery-lambda AWS CDK stack to the central tooling account using the following command: cdk deploy eks-discovery-lambda —require-approval never This deploys the AWS CDK stack named eks-discovery-lambda in tooling_account.py, which provisions the following resources in the central tooling account: Lambda function to discover EKS clusters across all AWS Regions and accounts and S3 bucket to store results SNS topic for notifications EventBridge scheduler for recurring execution Necessary IAM roles and policies The Lambda function collects cluster details, generates reports, and sends notifications. Step 2: Modify the EventBridge scheduler as needed If you would like to customize the EKS cluster discovery schedule, then navigate to EventBridge and under schedules find the newly created EKSDiscoveryWeeklySchedule. This is a cron-based scheduler, as shown in the following figure. Figure 7: Customize schedule for cluster discovery To receive notifications from Amazon SNS you must create a subscription to the topic. To do this, navigate to the Amazon SNS service, locate the newly created Topic named EKSDiscoverySNSTopic, and configure the protocol to meet your requirements (for example emailing to a group). Step 3: Deploy the cross-account role that the Lambda function can assume to perform discovery The Lambda function you deployed in Step 1 relies on a cross-account role in each of the accounts within the Organization to perform cluster discovery. Deploy the eks-discovery-stack-set AWS CDK stack that rolls out this cross-account role. cdk deploy eks-discovery-stack-set --app "python stack_sets.py" --require-approval never Step 4: Validate the solution To validate the solution, navigate to the newly created Lambda function and test with a new event and an empty JSON object. When the Lambda completes, verify that the S3 bucket receives the zip file and confirm that you received an SNS notification, as shown in the following figures. Figure 8: Sample output of cluster discovery in S3 bucket Figure 9: Sample contents of output file Figure 10: Sample list of clusters Figure 11: Sample count of clusters by version Step 5: (Optional) Monitor the solution You may want to monitor the solution. This can be done by setting up Amazon CloudWatch Alarms to monitor the Lambda function’s execution and any potential errors. Furthermore, regularly review the generated reports in the S3 bucket and periodically review and update the IAM permissions if needed. Troubleshooting Make sure that all IAM roles and policies are correctly set up and have the necessary permissions. Check CloudWatch Logs for any error messages in the Lambda functions or EventBridge rules. Security considerations Review and adjust the IAM roles and policies to adhere to the principle of least privilege and your environment. Regularly audit access to the centralized event management system. Cleaning up Run the following commands to clean up the resources provisioned: cdk destroy --app "python stack_sets.py" --all --force cdk destroy --all --force The first command deletes the CloudFormation StackSets that were deployed throughout the Organization using the AWS CDK App named stack_sets.py. The second command cleans up the resources provisioned within the central tooling account using the AWS CDK App named tooling_account.py. Conclusion This guide can help you set up a robust system using AWS services to provide proactive end of standard support notifications. This enables timely planning for upgrades, mitigating risks from outdated clusters while maintaining security, stability, and compliance. Moreover, the Amazon EKS cluster discovery and reporting solution marks a significant step forward in managing complex, multi-account Kubernetes environments on AWS. The solution enhances visibility, streamlines compliance efforts, facilitates strategic planning, and supports informed decision-making for cluster upgrades and resource allocation. As organizations continue to scale their containerized applications, these solutions become invaluable assets. They enable teams to maintain a clear overview of their Amazon EKS landscape, optimize resource usage, and make sure of consistent management practices across diverse deployments. Implementing these solutions allows you to take a significant step forward in managing the observability, resilience, and governance of your Amazon EKS environments. In turn, this makes sure of the long-term success and scalability of your Kubernetes initiatives on AWS. As a final call to action, we recommend trying both solutions to begin enhancing your EKS cluster observability today!View the full article
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.