Jump to content

Search the Community

Showing results for tags 'kubernetes'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • General
    • General Discussion
    • Artificial Intelligence
    • DevOpsForum News
  • DevOps & SRE
    • DevOps & SRE General Discussion
    • Databases, Data Engineering & Data Science
    • Development & Programming
    • CI/CD, GitOps, Orchestration & Scheduling
    • Docker, Containers, Microservices, Serverless & Virtualization
    • Infrastructure-as-Code
    • Kubernetes & Container Orchestration
    • Linux
    • Logging, Monitoring & Observability
    • Security, Governance, Risk & Compliance
  • Cloud Providers
    • Amazon Web Services
    • Google Cloud Platform
    • Microsoft Azure

Calendars

  • DevOps Events

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Cloud Platforms


Cloud Experience


Development Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

  1. This is a guest post by Pranav Kapoor, Head of DevOps at Upstox co-authored with Jayesh Vartak, Solutions Architect at AWS and Jitendra Shihani, Technical Account Manager (TAM) at AWS. Upstox is India’s largest investech, a multi-unicorn valued at $3.5 billion. It allows you to buy and sell stocks, mutual funds, and derivatives, and is loved and trusted by over 12 million customers. It is backed by Mr. Ratan Tata and Tiger Global and is the official partner for the Tata IPL (Indian Premier League). Upstox experienced 10x growth during the pandemic, with the number of users increasing from 1 million to over 10 million in 2022. To sustain this exponential growth and prepare for future expansion, Upstox set high standards for running its trading platform. These standards include availability, scalability, security, operational efficiency, and cost optimization. Availability: Targeted availability SLA from 99.9% to at least 99.99%. Scalability: Aimed to reduce the scaling lag and provide superior performance even during sudden bursts of traffic, such as market opening hours, budget day announcements, or market news. Security: Planned to implement more guardrails to improve security posture. Additionally focused on data privacy, handling of Personally Identifiable Information (PII) data, and storing and processing customer data in a highly secured way. Also aimed to streamline the auditing and compliance processes. Operational efficiency: Aimed to create new infrastructure or environment in an automated way using Infrastructure-as-a-Code (IaaC). Also, wanted to incorporate chaos engineering as part of release lifecycle. Cost Optimization: Aiming to reduce the infrastructure and operational cost without compromising availability, security, and scalability. In order to meet these targets, Upstox embarked on the journey of building a NextGen platform called “Greenfield”. Upstox chose Amazon Elastic Kubernetes Service (Amazon EKS) as a core compute platform for Greenfield to leverage the benefits of containers, whereas the earlier platform was on Amazon Elastic Compute Cloud (Amazon EC2). In this post, we share the tenets followed to build the Greenfield, its core differentiators, and the outcomes. Upstox Greenfield philosophy In order to build the future-proof architecture that can handle further exponential growth in traffic and incorporating flexibility for evolving over time, we used the following tenets: Security: Security is “job zero” at AWS and Upstox. It is the top priority in every aspect of the platform, following the least access privileges, mandatory encryption for data at-rest and in-transit, no SSH keys sharing, mandatory AWS Identity and Access Management (IAM) role based access (no IAM access key and secret access key sharing). Customer Experience: The architecture is focused on availability, performance, scalability, and resiliency to deliver the best customer experience. We set the simple principle that any server belonging to any service can be terminated at any point-in-time, and it should not have an impact on the customer experience. Smart Defaults: Simplicity is key to the long-term success of the platform. Today Kubernetes offers a wide range of capabilities, but if we use all these capabilities then we can quickly make the platform complex. Therefore, we decided to keep the platform simple yet powerful by selecting only a few of the Kubernetes capabilities. Additionally, to keep the platform simple, we followed the principle of building services with smart defaults. The idea is that all the services come with default settings so that novice users or new users can easily run their applications without getting into the complexities of the platform. At the same time, it allows advanced users to customize the defaults as per their respective use cases. For example, the default for the minimum number of pods is two, and the zone topology constraints spread the pods across multiple Availability Zones (AZs) so that by default the application is highly available. To avoid single point of failure, the application teams can’t override the minimum number of pods to less than two and can only override to more than two. Keeping the platform simple along with smart defaults has enabled all teams, such as development, QA, InfoSec, and Operations, to focus on understanding and leveraging a few yet powerful capabilities and be proficient in them. NoOps: The goal is to eliminate or minimize the manual activities as much as possible. This involves having continuous integration/continuous deployment (CI/CD) pipeline for not only applications but also infrastructure (such as IaaC). Also, the test first approach along with chaos engineering is vital to achieve NoOps. The objective is to empower the team by providing self-service, automated platform, and keeping the human activities to a minimum. For example, if a team wants to deploy the application and InfoSec needs to approve it, then they should keep the human touch-point to only the InfoSec team and eliminate the touch-point with the DevSecOps team. To keep the focus on automating everything, the DevSecOps team is comprised of mostly developers, so that they are spending most of the time automating the platform as opposed to day-to-day operations. Cost optimized: We wanted to build the cost-optimized platform. Along with the core cost optimization levers, such as right sizing, auto-scaling, AMD powered instances, AWS Graviton, spot instances, savings plan, and reserved instances, the platform also focuses on leveraging services features (such as S3 intelligent tier) and re-architecture of the applications for cost optimization. Approach It took about a year to build the Next-Gen trading platform. In the planning phase, we evaluated the applications portfolio for various dimensions, such as criticality of the application, dependencies of the application, complexity of the application, containerization effort, testing effort, current cost, and roadmap. Based on this evaluation, we categorized the applications into five buckets: Two representative applications for identified capabilities Two mission-critical applications Top cost contributing applications Remaining to-be containerized applications Applications to-be retained (not to-be containerized) Then, we migrated the applications in phases as follows, Phase 1 – Two representative applications for identified capabilities: In the first phase, we selected a few representative applications that were good candidates for migration to Amazon EKS. This migration was successfully completed, allowing us to validate critical features, such as gRPC, websockets, and load balancing. It helped the team learn Docker, Kubernetes, and Amazon EKS. The team understood the nuances and became confident in migrating and running containers. Phase 2 – Two mission-critical applications: In the second phase, we focused on the top two mission-critical applications. These applications provide the exchange’s data feed to end-users in real-time and also power the charts and graphs. These applications are critical for end-users to make trading decisions. The team resolved all the challenges in migrating and running these mission-critical applications and learned the specifics of running mission-critical applications on Amazon EKS. This experience streamlined the later phases, as the majority of the use cases for forthcoming applications had already been addressed through the initial two representative and two mission-critical applications. Phase 3 – Top cost contributing applications: In the third phase, we picked up the top cost contributing applications, such as complex applications. There were multiple benefits. Apart from realizing the cost savings early, we increased the agility, resiliency, scalability, and performance of these applications. Phase 4 – Remaining to-be containerized applications: In the fourth phase, we migrated the rest of the applications to Amazon EKS. With the learnings and insights we gained in the earlier phases, we were able to expedite the migration process significantly. We evaluated the applications for the fifth phase and concluded that containerization did not present the cost benefit ratio due to various factors, such as the efforts involved as opposed to the advantages and future strategic plans for these applications. Therefore, we retained these applications as is. Greenfield differentiation Scaling based on Traffic Pattern: Upstox did an in-depth analysis of daily traffic pattens. On a typical day, there’s an exponential spike in traffic during the market’s opening hour, approximately from 9:00 AM to 9:30 AM. Then, there is another surge in the traffic before market closing hour. The traffic varies throughout the trading hours. Whereas, traffic is very less during non-trading hours. This traffic pattern is shown in the following diagram. Scaling based on Traffic Pattern However, during a non-typical day, traffic varies based on market news, events such as budget day announcements. Considering these aspects, Upstox Implemented scaling based on multiple dimensions like time-based, request-based, and Memory/CPU based scaling. To further reduce/eliminate the scaling lag, we leveraged the technique “Eliminate Kubernetes node scaling lag with pod priority and over-provisioning”. Application Load Balancer pre-warming: Since there is a sudden spike in the traffic in a short span of time during market opening hours, the Application Load Balancer (ALB) also needs to scale accordingly. The ALB can scale to handle up to double the traffic in the next five minutes. However, the increase in the traffic during market opening hours is much higher than ALB can handle. Therefore, to address this challenge, Upstox has been using ALB pre-warming. With pre-warming, the higher capacity (LSU) is pre-provisioned to handle the sudden spike in the traffic. Karpenter: Upstox implemented Kubernetes Karpenter autoscaler instead of cluster autoscaler to scale the Amazon EKS worker nodes in-line with the traffic patterns. We used the following key features of Karpenter: Instance type selection:We used a broad set of EC2 instance types to mitigate the risk of instance unavailability and to optimize the cost. Node-recycling: The Amazon EKS cluster’s worker nodes use Amazon EKS optimized Amazon Linux AMIs to run containers securely and performantly. To make sure that the worker nodes are always up-to-date with the latest security patches and fixes, we adopted a routine cycle of replacing them with the most recent Amazon EKS-optimized Amazon Machine Images (AMIs). Therefore, by replacing the worker node instead of doing in-place OS updates of the existing worker node, we aligned with the best practice of treating the infrastructure as immutable. We used the following NodePool parameter to continuously replace the node with the latest Amazon EKS optimized AMI. spec.disruption.expireAfter Consolidation: We further optimized the cost by scaling-in the worker nodes in-line with the load. We used the following NodePool parameter to downsize the over-provisioned nodes and to reduce the number of nodes, as the load is decreased and pods are removed. spec.disruption.consolidationPolicy Security: The Greenfield platform is highly secured with a focus on data security, data privacy, access control, and compliance. Upstox leveraged AWS encryption SDK to manage PII data. Upstox leveraged AWS Systems Manager extensively as follows: Systems Manager – Session Manager: We managed the SSH access using Systems Manager – Session Manager and followed the least access principle using IAM policies and roles. Therefore, we eliminated the sharing of SSH keys. Systems Manager – Patch Manager: We also used the Systems Manager – Patch Manager to periodically apply the patches to the EC2 instances outside of Amazon EKS clusters. Systems Manager – Compliance: We used the Systems Manager – Compliance to generate the audit reports such as patch compliance data. Additional Cost Optimizations: To further optimize the cost, we followed these advanced cost optimization techniques: Right Sizing: As we containerized the applications, based on the load and performance testing, we chose the right size and type of the EC2 instances to get the best performance at the lowest cost. AMD powered instances for applications: AMD-powered EC2 instances give customers the ability to run general purpose, memory intensive, burstable, compute intensive, and graphics intensive workloads, all at a significant price advantage relative to comparable offerings. By adopting AMD powered Amazon EKS EC2 instances, Upstox achieved an impressive cost reduction of approximately 40-45% in the AWS Mumbai Region. Spot instances: EC2 Spot instances provide up to 90% cost savings as compared to on-demand instances. Therefore, Upstox migrated Amazon EKS Dev and Staging environments to Spot instances with fallback to on-demand instances if Spot instances were unavailable. This resulted in an overall 70% cost savings for dev and staging environments. Graviton instances for managed services: AWS Graviton processor offers upto 40% cost performance benefits and most of the managed services can be updated to use Graviton processor. Therefore, we leveraged graviton processor for managed services such as Amazon ElastiCache and Amazon Aurora. Amazon EBS gp3: Amazon EBS gp3 volumes offer up to 20% lower cost than previous generation gp2 volumes and also allow scaling of the performance independent of storage capacity. Therefore, we migrated all the EBS volumes from gp2 to gp3 to lower the cost by approximately by 18% and to get better performance. Amazon S3: With exponential growth in the business, the data and cost also increased significantly. Upstox leveraged S3 storage lens to optimize the Amazon Simple Storage Service (Amazon S3) cost as outlined in the case study “Upstox Saves $1 Million Annually Using Amazon S3 Storage Lens” Architectural Roadmap As part of the architecture evolution, Upstox is considering the following roadmap: Multi-architecture CPU: Upstox is considering a multi-architecture CPU, x86 and ARM, for all of their workloads by leveraging Graviton instances along with AMD instances. This diversification makes sure of a wider selection of Graviton instances, mitigating the risk of instance shortages. For example, if AMD instances are not available, then the workload scales using Graviton instances and the other way around. In turn, this improves the resiliency while keeping the cost optimized. Auto-scaling for databases: Currently databases are provisioned for peak capacity and there is no auto-scaling. Upstox is planning to implement auto-scaling for databases read-replicas so that the number of read replicas are automatically adjusted based on the load and to further optimize the cost. Combination of on-demand and Spot instances for production workload: Currently the production environment uses on-demand instances and the plan is to have a combination of on-demand and spot instances. This can further reduce the cost. Conclusion In this post, we showed the detailed journey of Upstox in developing a Greenfield platform, a transformative project that has significantly enhanced customer experiences, agility, security, and operational efficiency, all while reducing costs. The Platform has not only shortened the release lifecycle, enabling faster delivery of new use cases, but also managed to lower operational costs despite handling increased volumes. This was achieved without compromising on security, scalability, or performance. The Upstox initiative stands as a testament to how thoughtful innovation and strategic investment in technology can lead to substantial business benefits. View the full article
  2. The construction of big data applications based on open source software has become increasingly uncomplicated since the advent of projects like Data on EKS, an open source project from AWS to provide blueprints for building data and machine learning (ML) applications on Amazon Elastic Kubernetes Service (Amazon EKS). In the realm of big data, securing data on cloud applications is crucial. This post explores the deployment of Apache Ranger for permission management within the Hadoop ecosystem on Amazon EKS. We show how Ranger integrates with Hadoop components like Apache Hive, Spark, Trino, Yarn, and HDFS, providing secure and efficient data management in a cloud environment. Join us as we navigate these advanced security strategies in the context of Kubernetes and cloud computing. Overview of solution The Amber Group’s Data on EKS Platform (DEP) is a Kubernetes-based, cloud-centered big data platform that revolutionizes the way we handle data in EKS environments. Developed by Amber Group’s Data Team, DEP integrates with familiar components like Apache Hive, Spark, Flink, Trino, HDFS, and more, making it a versatile and comprehensive solution for data management and BI platforms. The following diagram illustrates the solution architecture. Effective permission management is crucial for several key reasons: Enhanced security – With proper permission management, sensitive data is only accessible to authorized individuals, thereby safeguarding against unauthorized access and potential security breaches. This is especially important in industries handling large volumes of sensitive or personal data. Operational efficiency – By defining clear user roles and permissions, organizations can streamline workflows and reduce administrative overhead. This system simplifies managing user access, saves time for data security administrators, and minimizes the risk of configuration errors. Scalability and compliance – As businesses grow and evolve, a scalable permission management system helps with smoothly adjusting user roles and access rights. This adaptability is essential for maintaining compliance with various data privacy regulations like GDPR and HIPAA, making sure that the organization’s data practices are legally sound and up to date. Addressing big data challenges – Big data comes with unique challenges, like managing large volumes of rapidly evolving data across multiple platforms. Effective permission management helps tackle these challenges by controlling how data is accessed and used, providing data integrity and minimizing the risk of data breaches. Apache Ranger is a comprehensive framework designed for data governance and security in Hadoop ecosystems. It provides a centralized framework to define, administer, and manage security policies consistently across various Hadoop components. Ranger specializes in fine-grained access control, offering detailed management of user permissions and auditing capabilities. Ranger’s architecture is designed to integrate smoothly with various big data tools such as Hadoop, Hive, HBase, and Spark. The key components of Ranger include: Ranger Admin – This is the central component where all security policies are created and managed. It provides a web-based user interface for policy management and an API for programmatic configuration. Ranger UserSync – This service is responsible for syncing user and group information from a directory service like LDAP or AD into Ranger. Ranger plugins – These are installed on each component of the Hadoop ecosystem (like Hive and HBase). Plugins pull policies from the Ranger Admin service and enforce them locally. Ranger Auditing – Ranger captures access audit logs and stores them for compliance and monitoring purposes. It can integrate with external tools for advanced analytics on these audit logs. Ranger Key Management Store (KMS) – Ranger KMS provides encryption and key management, extending Hadoop’s HDFS Transparent Data Encryption (TDE). The following flowchart illustrates the priority levels for matching policies. The priority levels are as follows: Deny list takes precedence over allow list Deny list exclude has a higher priority than deny list Allow list exclude has a higher priority than allow list Our Amazon EKS-based deployment includes the following components: S3 buckets – We use Amazon Simple Storage Service (Amazon S3) for scalable and durable Hive data storage MySQL database – The database stores Hive metadata, facilitating efficient metadata retrieval and management EKS cluster – The cluster is comprised of three distinct node groups: platform, Hadoop, and Trino, each tailored for specific operational needs Hadoop cluster applications – These applications include HDFS for distributed storage and YARN for managing cluster resources Trino cluster application – This application enables us to run distributed SQL queries for analytics Apache Ranger – Ranger serves as the central security management tool for access policy across the big data components OpenLDAP – This is integrated as the LDAP service to provide a centralized user information repository, essential for user authentication and authorization Other cloud services resources – Other resources include a dedicated VPC for network security and isolation By the end of this deployment process, we will have realized the following benefits: A high-performing, scalable big data platform that can handle complex data workflows with ease Enhanced security through centralized management of authentication and authorization, provided by the integration of OpenLDAP and Apache Ranger Cost-effective infrastructure management and operation, thanks to the containerized nature of services on Amazon EKS Compliance with stringent data security and privacy regulations, due to Apache Ranger’s policy enforcement capabilities Deploy a big data cluster on Amazon EKS and configure Ranger for access control In this section, we outline the process of deploying a big data cluster on AWS EKS and configuring Ranger for access control. We use AWS CloudFormation templates for quick deployment of a big data environment on Amazon EKS with Apache Ranger. Complete the following steps: Upload the provided template to AWS CloudFormation, configure the stack options, and launch the stack to automate the deployment of the entire infrastructure, including the EKS cluster and Apache Ranger integration. After a few minutes, you’ll have a fully functional big data environment with robust security management ready for your analytical workloads, as shown in the following screenshot. On the AWS web console, find the name of your EKS cluster. In this case, it’s dep-demo-eks-cluster-ap-northeast-1. For example: aws eks update-kubeconfig --name dep-eks-cluster-ap-northeast-1 --region ap-northeast-1 ## Check pod status. kubectl get pods --namespace hadoop kubectl get pods --namespace platform kubectl get pods --namespace trino After Ranger Admin is successfully forwarded to port 6080 of localhost, go to localhost:6080 in your browser. Log in with user name admin and the password you entered earlier. By default, you have already created two policies: Hive and Trino, and granted all access to the LDAP user you created (depadmin in this case). Also, the LDAP user sync service is set up and will automatically sync all users from the LDAP service created in this template. Example permission configuration In a practical application within a company, permissions for tables and fields in the data warehouse are divided based on business departments, isolating sensitive data for different business units. This provides data security and orderly conduct of daily business operations. The following screenshots show an example business configuration. The following is an example of an Apache Ranger permission configuration. The following screenshots show users associated with roles. When performing data queries, using Hive and Spark as examples, we can demonstrate the comparison before and after permission configuration. The following screenshot shows an example of Hive SQL (running on superset) with privileges denied. The following screenshot shows an example of Spark SQL (running on IDE) with privileges denied. The following screenshot shows an example of Spark SQL (running on IDE) with permissions permitting. Based on this example and considering your enterprise requirements, it becomes feasible and flexible to manage permissions in the data warehouse effectively. Conclusion This post provided a comprehensive guide on permission management in big data, particularly within the Amazon EKS platform using Apache Ranger, that equips you with the essential knowledge and tools for robust data security and management. By implementing the strategies and understanding the components detailed in this post, you can effectively manage permissions, implementing data security and compliance in your big data environments. About the Authors Yuzhu Xiao is a Senior Data Development Engineer at Amber Group with extensive experience in cloud data platform architecture. He has many years of experience in AWS Cloud platform data architecture and development, primarily focusing on efficiency optimization and cost control of enterprise cloud architectures. Xin Zhang is an AWS Solutions Architect, responsible for solution consulting and design based on the AWS Cloud platform. He has a rich experience in R&D and architecture practice in the fields of system architecture, data warehousing, and real-time computing. View the full article
  3. With Kubernetes 1.30, we (SIG Auth) are moving Structured Authorization Configuration to beta. Today's article is about authorization: deciding what someone can and cannot access. Check a previous article from yesterday to find about what's new in Kubernetes v1.30 around authentication (finding out who's performing a task, and checking that they are who they say they are). Introduction Kubernetes continues to evolve to meet the intricate requirements of system administrators and developers alike. A critical aspect of Kubernetes that ensures the security and integrity of the cluster is the API server authorization. Until recently, the configuration of the authorization chain in kube-apiserver was somewhat rigid, limited to a set of command-line flags and allowing only a single webhook in the authorization chain. This approach, while functional, restricted the flexibility needed by cluster administrators to define complex, fine-grained authorization policies. The latest Structured Authorization Configuration feature (KEP-3221) aims to revolutionize this aspect by introducing a more structured and versatile way to configure the authorization chain, focusing on enabling multiple webhooks and providing explicit control mechanisms. The Need for Improvement Cluster administrators have long sought the ability to specify multiple authorization webhooks within the API Server handler chain and have control over detailed behavior like timeout and failure policy for each webhook. This need arises from the desire to create layered security policies, where requests can be validated against multiple criteria or sets of rules in a specific order. The previous limitations also made it difficult to dynamically configure the authorizer chain, leaving no room to manage complex authorization scenarios efficiently. The Structured Authorization Configuration feature addresses these limitations by introducing a configuration file format to configure the Kubernetes API Server Authorization chain. This format allows specifying multiple webhooks in the authorization chain (all other authorization types are specified no more than once). Each webhook authorizer has well-defined parameters, including timeout settings, failure policies, and conditions for invocation with CEL rules to pre-filter requests before they are dispatched to webhooks, helping you prevent unnecessary invocations. The configuration also supports automatic reloading, ensuring changes can be applied dynamically without restarting the kube-apiserver. This feature addresses current limitations and opens up new possibilities for securing and managing Kubernetes clusters more effectively. Sample Configurations Here is a sample structured authorization configuration along with descriptions for all fields, their defaults, and possible values. apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthorizationConfiguration authorizers: - type: Webhook # Name used to describe the authorizer # This is explicitly used in monitoring machinery for metrics # Note: # - Validation for this field is similar to how K8s labels are validated today. # Required, with no default name: webhook webhook: # The duration to cache 'authorized' responses from the webhook # authorizer. # Same as setting `--authorization-webhook-cache-authorized-ttl` flag # Default: 5m0s authorizedTTL: 30s # The duration to cache 'unauthorized' responses from the webhook # authorizer. # Same as setting `--authorization-webhook-cache-unauthorized-ttl` flag # Default: 30s unauthorizedTTL: 30s # Timeout for the webhook request # Maximum allowed is 30s. # Required, with no default. timeout: 3s # The API version of the authorization.k8s.io SubjectAccessReview to # send to and expect from the webhook. # Same as setting `--authorization-webhook-version` flag # Required, with no default # Valid values: v1beta1, v1 subjectAccessReviewVersion: v1 # MatchConditionSubjectAccessReviewVersion specifies the SubjectAccessReview # version the CEL expressions are evaluated against # Valid values: v1 # Required, no default value matchConditionSubjectAccessReviewVersion: v1 # Controls the authorization decision when a webhook request fails to # complete or returns a malformed response or errors evaluating # matchConditions. # Valid values: # - NoOpinion: continue to subsequent authorizers to see if one of # them allows the request # - Deny: reject the request without consulting subsequent authorizers # Required, with no default. failurePolicy: Deny connectionInfo: # Controls how the webhook should communicate with the server. # Valid values: # - KubeConfig: use the file specified in kubeConfigFile to locate the # server. # - InClusterConfig: use the in-cluster configuration to call the # SubjectAccessReview API hosted by kube-apiserver. This mode is not # allowed for kube-apiserver. type: KubeConfig # Path to KubeConfigFile for connection info # Required, if connectionInfo.Type is KubeConfig kubeConfigFile: /kube-system-authz-webhook.yaml # matchConditions is a list of conditions that must be met for a request to be sent to this # webhook. An empty list of matchConditions matches all requests. # There are a maximum of 64 match conditions allowed. # # The exact matching logic is (in order): # 1. If at least one matchCondition evaluates to FALSE, then the webhook is skipped. # 2. If ALL matchConditions evaluate to TRUE, then the webhook is called. # 3. If at least one matchCondition evaluates to an error (but none are FALSE): # - If failurePolicy=Deny, then the webhook rejects the request # - If failurePolicy=NoOpinion, then the error is ignored and the webhook is skipped matchConditions: # expression represents the expression which will be evaluated by CEL. Must evaluate to bool. # CEL expressions have access to the contents of the SubjectAccessReview in v1 version. # If version specified by subjectAccessReviewVersion in the request variable is v1beta1, # the contents would be converted to the v1 version before evaluating the CEL expression. # # Documentation on CEL: https://kubernetes.io/docs/reference/using-api/cel/ # # only send resource requests to the webhook - expression: has(request.resourceAttributes) # only intercept requests to kube-system - expression: request.resourceAttributes.namespace == 'kube-system' # don't intercept requests from kube-system service accounts - expression: !('system:serviceaccounts:kube-system' in request.user.groups) - type: Node name: node - type: RBAC name: rbac - type: Webhook name: in-cluster-authorizer webhook: authorizedTTL: 5m unauthorizedTTL: 30s timeout: 3s subjectAccessReviewVersion: v1 failurePolicy: NoOpinion connectionInfo: type: InClusterConfig The following configuration examples illustrate real-world scenarios that need the ability to specify multiple webhooks with distinct settings, precedence order, and failure modes. Protecting Installed CRDs Ensuring of Custom Resource Definitions (CRDs) availability at cluster startup has been a key demand. One of the blockers of having a controller reconcile those CRDs is having a protection mechanism for them, which can be achieved through multiple authorization webhooks. This was not possible before as specifying multiple authorization webhooks in the Kubernetes API Server authorization chain was simply not possible. Now, with the Structured Authorization Configuration feature, administrators can specify multiple webhooks, offering a solution where RBAC falls short, especially when denying permissions to 'non-system' users for certain CRDs. Assuming the following for this scenario: The "protected" CRDs are installed. They can only be modified by users in the group admin. apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthorizationConfiguration authorizers: - type: Webhook name: system-crd-protector webhook: unauthorizedTTL: 30s timeout: 3s subjectAccessReviewVersion: v1 matchConditionSubjectAccessReviewVersion: v1 failurePolicy: Deny connectionInfo: type: KubeConfig kubeConfigFile: /files/kube-system-authz-webhook.yaml matchConditions: # only send resource requests to the webhook - expression: has(request.resourceAttributes) # only intercept requests for CRDs - expression: request.resourceAttributes.resource.resource = "customresourcedefinitions" - expression: request.resourceAttributes.resource.group = "" # only intercept update, patch, delete, or deletecollection requests - expression: request.resourceAttributes.verb in ['update', 'patch', 'delete','deletecollection'] - type: Node - type: RBAC Preventing unnecessarily nested webhooks A system administrator wants to apply specific validations to requests before handing them off to webhooks using frameworks like Open Policy Agent. In the past, this would require running nested webhooks within the one added to the authorization chain to achieve the desired result. The Structured Authorization Configuration feature simplifies this process, offering a structured API to selectively trigger additional webhooks when needed. It also enables administrators to set distinct failure policies for each webhook, ensuring more consistent and predictable responses. apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthorizationConfiguration authorizers: - type: Webhook name: system-crd-protector webhook: unauthorizedTTL: 30s timeout: 3s subjectAccessReviewVersion: v1 matchConditionSubjectAccessReviewVersion: v1 failurePolicy: Deny connectionInfo: type: KubeConfig kubeConfigFile: /files/kube-system-authz-webhook.yaml matchConditions: # only send resource requests to the webhook - expression: has(request.resourceAttributes) # only intercept requests for CRDs - expression: request.resourceAttributes.resource.resource = "customresourcedefinitions" - expression: request.resourceAttributes.resource.group = "" # only intercept update, patch, delete, or deletecollection requests - expression: request.resourceAttributes.verb in ['update', 'patch', 'delete','deletecollection'] - type: Node - type: RBAC - name: opa type: Webhook webhook: unauthorizedTTL: 30s timeout: 3s subjectAccessReviewVersion: v1 matchConditionSubjectAccessReviewVersion: v1 failurePolicy: Deny connectionInfo: type: KubeConfig kubeConfigFile: /files/opa-default-authz-webhook.yaml matchConditions: # only send resource requests to the webhook - expression: has(request.resourceAttributes) # only intercept requests to default namespace - expression: request.resourceAttributes.namespace == 'default' # don't intercept requests from default service accounts - expression: !('system:serviceaccounts:default' in request.user.groups) What's next? From Kubernetes 1.30, the feature is in beta and enabled by default. For Kubernetes v1.31, we expect the feature to stay in beta while we get more feedback from users. Once it is ready for GA, the feature flag will be removed, and the configuration file version will be promoted to v1. Learn more about this feature on the structured authorization configuration Kubernetes doc website. You can also follow along with KEP-3221 to track progress in coming Kubernetes releases. Call to action In this post, we have covered the benefits of the Structured Authorization Configuration feature in Kubernetes v1.30 and a few sample configurations for real-world scenarios. To use this feature, you must specify the path to the authorization configuration using the --authorization-config command line argument. From Kubernetes 1.30, the feature is in beta and enabled by default. If you want to keep using command line flags instead of a configuration file, those will continue to work as-is. Specifying both --authorization-config and --authorization-modes/--authorization-webhook-* won't work. You need to drop the older flags from your kube-apiserver command. The following kind Cluster configuration sets that command argument on the APIserver to load an AuthorizationConfiguration from a file (authorization_config.yaml) in the files folder. Any needed kubeconfig and certificate files can also be put in the files directory. kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 featureGates: StructuredAuthorizationConfiguration: true # enabled by default in v1.30 kubeadmConfigPatches: - | kind: ClusterConfiguration metadata: name: config apiServer: extraArgs: authorization-config: "/files/authorization_config.yaml" extraVolumes: - name: files hostPath: "/files" mountPath: "/files" readOnly: true nodes: - role: control-plane extraMounts: - hostPath: files containerPath: /files We would love to hear your feedback on this feature. In particular, we would like feedback from Kubernetes cluster administrators and authorization webhook implementors as they build their integrations with this new API. Please reach out to us on the #sig-auth-authorizers-dev channel on Kubernetes Slack. How to get involved If you are interested in helping develop this feature, sharing feedback, or participating in any other ongoing SIG Auth projects, please reach out on the #sig-auth channel on Kubernetes Slack. You are also welcome to join the bi-weekly SIG Auth meetings held every other Wednesday. Acknowledgments This feature was driven by contributors from several different companies. We would like to extend a huge thank you to everyone who contributed their time and effort to make this possible. View the full article
  4. With Kubernetes 1.30, we (SIG Auth) are moving Structured Authentication Configuration to beta. Today's article is about authentication: finding out who's performing a task, and checking that they are who they say they are. Check back in tomorrow to find about what's new in Kubernetes v1.30 around authorization (deciding what someone can and can't access). Motivation Kubernetes has had a long-standing need for a more flexible and extensible authentication system. The current system, while powerful, has some limitations that make it difficult to use in certain scenarios. For example, it is not possible to use multiple authenticators of the same type (e.g., multiple JWT authenticators) or to change the configuration without restarting the API server. The Structured Authentication Configuration feature is the first step towards addressing these limitations and providing a more flexible and extensible way to configure authentication in Kubernetes. What is structured authentication configuration? Kubernetes v1.30 builds on the experimental support for configurating authentication based on a file, that was added as alpha in Kubernetes v1.30. At this beta stage, Kubernetes only supports configuring JWT authenticators, which serve as the next iteration of the existing OIDC authenticator. JWT authenticator is an authenticator to authenticate Kubernetes users using JWT compliant tokens. The authenticator will attempt to parse a raw ID token, verify it's been signed by the configured issuer. The Kubernetes project added configuration from a file so that it can provide more flexibility than using command line options (which continue to work, and are still supported). Supporting a configuration file also makes it easy to deliver further improvements in upcoming releases. Benefits of structured authentication configuration Here's why using a configuration file to configure cluster authentication is a benefit: Multiple JWT authenticators: You can configure multiple JWT authenticators simultaneously. This allows you to use multiple identity providers (e.g., Okta, Keycloak, GitLab) without needing to use an intermediary like Dex that handles multiplexing between multiple identity providers. Dynamic configuration: You can change the configuration without restarting the API server. This allows you to add, remove, or modify authenticators without disrupting the API server. Any JWT-compliant token: You can use any JWT-compliant token for authentication. This allows you to use tokens from any identity provider that supports JWT. The minimum valid JWT payload must contain the claims documented in structured authentication configuration page in the Kubernetes documentation. CEL (Common Expression Language) support: You can use CEL to determine whether the token's claims match the user's attributes in Kubernetes (e.g., username, group). This allows you to use complex logic to determine whether a token is valid. Multiple audiences: You can configure multiple audiences for a single authenticator. This allows you to use the same authenticator for multiple audiences, such as using a different OAuth client for kubectl and dashboard. Using identity providers that don't support OpenID connect discovery: You can use identity providers that don't support OpenID Connect discovery. The only requirement is to host the discovery document at a different location than the issuer (such as locally in the cluster) and specify the issuer.discoveryURL in the configuration file. How to use Structured Authentication Configuration To use structured authentication configuration, you specify the path to the authentication configuration using the --authentication-config command line argument in the API server. The configuration file is a YAML file that specifies the authenticators and their configuration. Here is an example configuration file that configures two JWT authenticators: apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthenticationConfiguration # Someone with a valid token from either of these issuers could authenticate # against this cluster. jwt: - issuer: url: https://issuer1.example.com audiences: - audience1 - audience2 audienceMatchPolicy: MatchAny claimValidationRules: expression: 'claims.hd == "example.com"' message: "the hosted domain name must be example.com" claimMappings: username: expression: 'claims.username' groups: expression: 'claims.groups' uid: expression: 'claims.uid' extra: - key: 'example.com/tenant' expression: 'claims.tenant' userValidationRules: - expression: "!user.username.startsWith('system:')" message: "username cannot use reserved system: prefix" # second authenticator that exposes the discovery document at a different location # than the issuer - issuer: url: https://issuer2.example.com discoveryURL: https://discovery.example.com/.well-known/openid-configuration audiences: - audience3 - audience4 audienceMatchPolicy: MatchAny claimValidationRules: expression: 'claims.hd == "example.com"' message: "the hosted domain name must be example.com" claimMappings: username: expression: 'claims.username' groups: expression: 'claims.groups' uid: expression: 'claims.uid' extra: - key: 'example.com/tenant' expression: 'claims.tenant' userValidationRules: - expression: "!user.username.startsWith('system:')" message: "username cannot use reserved system: prefix" Migration from command line arguments to configuration file The Structured Authentication Configuration feature is designed to be backwards-compatible with the existing approach, based on command line options, for configuring the JWT authenticator. This means that you can continue to use the existing command-line options to configure the JWT authenticator. However, we (Kubernetes SIG Auth) recommend migrating to the new configuration file-based approach, as it provides more flexibility and extensibility. Note If you specify --authentication-config along with any of the --oidc-* command line arguments, this is a misconfiguration. In this situation, the API server reports an error and then immediately exits. If you want to switch to using structured authentication configuration, you have to remove the --oidc-* command line arguments, and use the configuration file instead. Here is an example of how to migrate from the command-line flags to the configuration file: Command-line arguments --oidc-issuer-url=https://issuer.example.com --oidc-client-id=example-client-id --oidc-username-claim=username --oidc-groups-claim=groups --oidc-username-prefix=oidc: --oidc-groups-prefix=oidc: --oidc-required-claim="hd=example.com" --oidc-required-claim="admin=true" --oidc-ca-file=/path/to/ca.pem There is no equivalent in the configuration file for the --oidc-signing-algs. For Kubernetes v1.30, the authenticator supports all the asymmetric algorithms listed in oidc.go. Configuration file apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthenticationConfiguration jwt: - issuer: url: https://issuer.example.com audiences: - example-client-id certificateAuthority: <value is the content of file /path/to/ca.pem> claimMappings: username: claim: username prefix: "oidc:" groups: claim: groups prefix: "oidc:" claimValidationRules: - claim: hd requiredValue: "example.com" - claim: admin requiredValue: "true" What's next? For Kubernetes v1.31, we expect the feature to stay in beta while we get more feedback. In the coming releases, we want to investigate: Making distributed claims work via CEL expressions. Egress selector configuration support for calls to issuer.url and issuer.discoveryURL. You can learn more about this feature on the structured authentication configuration page in the Kubernetes documentation. You can also follow along on the KEP-3331 to track progress across the coming Kubernetes releases. Try it out In this post, I have covered the benefits the Structured Authentication Configuration feature brings in Kubernetes v1.30. To use this feature, you must specify the path to the authentication configuration using the --authentication-config command line argument. From Kubernetes v1.30, the feature is in beta and enabled by default. If you want to keep using command line arguments instead of a configuration file, those will continue to work as-is. We would love to hear your feedback on this feature. Please reach out to us on the #sig-auth-authenticators-dev channel on Kubernetes Slack (for an invitation, visit https://slack.k8s.io/). How to get involved If you are interested in getting involved in the development of this feature, share feedback, or participate in any other ongoing SIG Auth projects, please reach out on the #sig-auth channel on Kubernetes Slack. You are also welcome to join the bi-weekly SIG Auth meetings held every-other Wednesday. View the full article
  5. On behalf of the Kubernetes project, I am excited to announce that ValidatingAdmissionPolicy has reached general availability as part of Kubernetes 1.30 release. If you have not yet read about this new declarative alternative to validating admission webhooks, it may be interesting to read our previous post about the new feature. If you have already heard about ValidatingAdmissionPolicies and you are eager to try them out, there is no better time to do it than now. Let's have a taste of a ValidatingAdmissionPolicy, by replacing a simple webhook. Example admission webhook First, let's take a look at an example of a simple webhook. Here is an excerpt from a webhook that enforces runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation, and privileged to be set to the least permissive values. func verifyDeployment(deploy *appsv1.Deployment) error { var errs []error for i, c := range deploy.Spec.Template.Spec.Containers { if c.Name == "" { return fmt.Errorf("container %d has no name", i) } if c.SecurityContext == nil { errs = append(errs, fmt.Errorf("container %q does not have SecurityContext", c.Name)) } if c.SecurityContext.RunAsNonRoot == nil || !*c.SecurityContext.RunAsNonRoot { errs = append(errs, fmt.Errorf("container %q must set RunAsNonRoot to true in its SecurityContext", c.Name)) } if c.SecurityContext.ReadOnlyRootFilesystem == nil || !*c.SecurityContext.ReadOnlyRootFilesystem { errs = append(errs, fmt.Errorf("container %q must set ReadOnlyRootFilesystem to true in its SecurityContext", c.Name)) } if c.SecurityContext.AllowPrivilegeEscalation != nil && *c.SecurityContext.AllowPrivilegeEscalation { errs = append(errs, fmt.Errorf("container %q must NOT set AllowPrivilegeEscalation to true in its SecurityContext", c.Name)) } if c.SecurityContext.Privileged != nil && *c.SecurityContext.Privileged { errs = append(errs, fmt.Errorf("container %q must NOT set Privileged to true in its SecurityContext", c.Name)) } } return errors.NewAggregate(errs) } Check out What are admission webhooks? Or, see the full code of this webhook to follow along with this walkthrough. The policy Now let's try to recreate the validation faithfully with a ValidatingAdmissionPolicy. apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "pod-security.policy.example.com" spec: failurePolicy: Fail matchConstraints: resourceRules: - apiGroups: ["apps"] apiVersions: ["v1"] operations: ["CREATE", "UPDATE"] resources: ["deployments"] validations: - expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.runAsNonRoot) && c.securityContext.runAsNonRoot) message: 'all containers must set runAsNonRoot to true' - expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.readOnlyRootFilesystem) && c.securityContext.readOnlyRootFilesystem) message: 'all containers must set readOnlyRootFilesystem to true' - expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.allowPrivilegeEscalation) || !c.securityContext.allowPrivilegeEscalation) message: 'all containers must NOT set allowPrivilegeEscalation to true' - expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.Privileged) || !c.securityContext.Privileged) message: 'all containers must NOT set privileged to true' Create the policy with kubectl. Great, no complain so far. But let's get the policy object back and take a look at its status. kubectl get -oyaml validatingadmissionpolicies/pod-security.policy.example.com status: typeChecking: expressionWarnings: - fieldRef: spec.validations[3].expression warning: | apps/v1, Kind=Deployment: ERROR: <input>:1:76: undefined field 'Privileged' | object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.Privileged) || !c.securityContext.Privileged) | ...........................................................................^ ERROR: <input>:1:128: undefined field 'Privileged' | object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.Privileged) || !c.securityContext.Privileged) | ...............................................................................................................................^ The policy was checked against its matched type, which is apps/v1.Deployment. Looking at the fieldRef, the problem was with the 3rd expression (index starts with 0) The expression in question accessed an undefined Privileged field. Ahh, looks like it was a copy-and-paste error. The field name should be in lowercase. apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "pod-security.policy.example.com" spec: failurePolicy: Fail matchConstraints: resourceRules: - apiGroups: ["apps"] apiVersions: ["v1"] operations: ["CREATE", "UPDATE"] resources: ["deployments"] validations: - expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.runAsNonRoot) && c.securityContext.runAsNonRoot) message: 'all containers must set runAsNonRoot to true' - expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.readOnlyRootFilesystem) && c.securityContext.readOnlyRootFilesystem) message: 'all containers must set readOnlyRootFilesystem to true' - expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.allowPrivilegeEscalation) || !c.securityContext.allowPrivilegeEscalation) message: 'all containers must NOT set allowPrivilegeEscalation to true' - expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.privileged) || !c.securityContext.privileged) message: 'all containers must NOT set privileged to true' Check its status again, and you should see all warnings cleared. Next, let's create a namespace for our tests. kubectl create namespace policy-test Then, I bind the policy to the namespace. But at this point, I set the action to Warn so that the policy prints out warnings instead of rejecting the requests. This is especially useful to collect results from all expressions during development and automated testing. apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicyBinding metadata: name: "pod-security.policy-binding.example.com" spec: policyName: "pod-security.policy.example.com" validationActions: ["Warn"] matchResources: namespaceSelector: matchLabels: "kubernetes.io/metadata.name": "policy-test" Tests out policy enforcement. kubectl create -n policy-test -f- <<EOF apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx name: nginx spec: selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: nginx name: nginx securityContext: privileged: true allowPrivilegeEscalation: true EOF Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must set runAsNonRoot to true Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must set readOnlyRootFilesystem to true Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must NOT set allowPrivilegeEscalation to true Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must NOT set privileged to true Error from server: error when creating "STDIN": admission webhook "webhook.example.com" denied the request: [container "nginx" must set RunAsNonRoot to true in its SecurityContext, container "nginx" must set ReadOnlyRootFilesystem to true in its SecurityContext, container "nginx" must NOT set AllowPrivilegeEscalation to true in its SecurityContext, container "nginx" must NOT set Privileged to true in its SecurityContext] Looks great! The policy and the webhook give equivalent results. After a few other cases, when we are confident with our policy, maybe it is time to do some cleanup. For every expression, we repeat access to object.spec.template.spec.containers and to each securityContext; There is a pattern of checking presence of a field and then accessing it, which looks a bit verbose. Fortunately, since Kubernetes 1.28, we have new solutions for both issues. Variable Composition allows us to extract repeated sub-expressions into their own variables. Kubernetes enables the optional library for CEL, which are excellent to work with fields that are, you guessed it, optional. With both features in mind, let's refactor the policy a bit. apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "pod-security.policy.example.com" spec: failurePolicy: Fail matchConstraints: resourceRules: - apiGroups: ["apps"] apiVersions: ["v1"] operations: ["CREATE", "UPDATE"] resources: ["deployments"] variables: - name: containers expression: object.spec.template.spec.containers - name: securityContexts expression: 'variables.containers.map(c, c.?securityContext)' validations: - expression: variables.securityContexts.all(c, c.?runAsNonRoot == optional.of(true)) message: 'all containers must set runAsNonRoot to true' - expression: variables.securityContexts.all(c, c.?readOnlyRootFilesystem == optional.of(true)) message: 'all containers must set readOnlyRootFilesystem to true' - expression: variables.securityContexts.all(c, c.?allowPrivilegeEscalation != optional.of(true)) message: 'all containers must NOT set allowPrivilegeEscalation to true' - expression: variables.securityContexts.all(c, c.?privileged != optional.of(true)) message: 'all containers must NOT set privileged to true' The policy is now much cleaner and more readable. Update the policy, and you should see it function the same as before. Now let's change the policy binding from warning to actually denying requests that fail validation. apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicyBinding metadata: name: "pod-security.policy-binding.example.com" spec: policyName: "pod-security.policy.example.com" validationActions: ["Deny"] matchResources: namespaceSelector: matchLabels: "kubernetes.io/metadata.name": "policy-test" And finally, remove the webhook. Now the result should include only messages from the policy. kubectl create -n policy-test -f- <<EOF apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx name: nginx spec: selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: nginx name: nginx securityContext: privileged: true allowPrivilegeEscalation: true EOF The deployments "nginx" is invalid: : ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com' denied request: all containers must set runAsNonRoot to true Please notice that, by design, the policy will stop evaluation after the first expression that causes the request to be denied. This is different from what happens when the expressions generate only warnings. Set up monitoring Unlike a webhook, a policy is not a dedicated process that can expose its own metrics. Instead, you can use metrics from the API server in their place. Here are some examples in Prometheus Query Language of common monitoring tasks. To find the 95th percentile execution duration of the policy shown above. histogram_quantile(0.95, sum(rate(apiserver_validating_admission_policy_check_duration_seconds_bucket{policy="pod-security.policy.example.com"}[5m])) by (le)) To find the rate of the policy evaluation. rate(apiserver_validating_admission_policy_check_total{policy="pod-security.policy.example.com"}[5m]) You can read the metrics reference to learn more about the metrics above. The metrics of ValidatingAdmissionPolicy are currently in alpha, and more and better metrics will come while the stability graduates in the future release. View the full article
  6. Amazon CloudWatch Container Insights with Enhanced Observability for EKS now auto-discovers critical health metrics from your AWS accelerators Trainium and Inferentia, and AWS high performance network adapters (Elastic Fabric Adapters) as well as NVIDIA GPUs. You can visualize these out-of-the-box metrics in curated Container Insights dashboards to help monitor your accelerated infrastructure and optimize your AI workloads for operational excellence. View the full article
  7. Author: Akihiro Suda (NTT) Read-only volume mounts have been a feature of Kubernetes since the beginning. Surprisingly, read-only mounts are not completely read-only under certain conditions on Linux. As of the v1.30 release, they can be made completely read-only, with alpha support for recursive read-only mounts. Read-only volume mounts are not really read-only by default Volume mounts can be deceptively complicated. You might expect that the following manifest makes everything under /mnt in the containers read-only: --- apiVersion: v1 kind: Pod spec: volumes: - name: mnt hostPath: path: /mnt containers: - volumeMounts: - name: mnt mountPath: /mnt readOnly: true However, any sub-mounts beneath /mnt may still be writable! For example, consider that /mnt/my-nfs-server is writeable on the host. Inside the container, writes to /mnt/* will be rejected but /mnt/my-nfs-server/* will still be writeable. New mount option: recursiveReadOnly Kubernetes 1.30 added a new mount option recursiveReadOnly so as to make submounts recursively read-only. The option can be enabled as follows: --- apiVersion: v1 kind: Pod spec: volumes: - name: mnt hostPath: path: /mnt containers: - volumeMounts: - name: mnt mountPath: /mnt readOnly: true # NEW # Possible values are `Enabled`, `IfPossible`, and `Disabled`. # Needs to be specified in conjunction with `readOnly: true`. recursiveReadOnly: Enabled This is implemented by applying the MOUNT_ATTR_RDONLY attribute with the AT_RECURSIVE flag using mount_setattr(2) added in Linux kernel v5.12. For backwards compatibility, the recursiveReadOnly field is not a replacement for readOnly, but is used in conjunction with it. To get a properly recursive read-only mount, you must set both fields. Feature availability To enable recursiveReadOnly mounts, the following components have to be used: Kubernetes: v1.30 or later, with the RecursiveReadOnlyMounts feature gate enabled. As of v1.30, the gate is marked as alpha. CRI runtime: containerd: v2.0 or later OCI runtime: runc: v1.1 or later crun: v1.8.6 or later Linux kernel: v5.12 or later What's next? Kubernetes SIG Node hope - and expect - that the feature will be promoted to beta and eventually general availability (GA) in future releases of Kubernetes, so that users no longer need to enable the feature gate manually. The default value of recursiveReadOnly will still remain Disabled, for backwards compatibility. How can I learn more? Please check out the documentation for the further details of recursiveReadOnly mounts. How to get involved? This feature is driven by the SIG Node community. Please join us to connect with the community and share your ideas and feedback around the above feature and beyond. We look forward to hearing from you! View the full article
  8. Authors: Rodrigo Campos Catelin (Microsoft), Giuseppe Scrivano (Red Hat), Sascha Grunert (Red Hat) Linux provides different namespaces to isolate processes from each other. For example, a typical Kubernetes pod runs within a network namespace to isolate the network identity and a PID namespace to isolate the processes. One Linux namespace that was left behind is the user namespace. This namespace allows us to isolate the user and group identifiers (UIDs and GIDs) we use inside the container from the ones on the host. This is a powerful abstraction that allows us to run containers as "root": we are root inside the container and can do everything root can inside the pod, but our interactions with the host are limited to what a non-privileged user can do. This is great for limiting the impact of a container breakout. A container breakout is when a process inside a container can break out onto the host using some unpatched vulnerability in the container runtime or the kernel and can access/modify files on the host or other containers. If we run our pods with user namespaces, the privileges the container has over the rest of the host are reduced, and the files outside the container it can access are limited too. In Kubernetes v1.25, we introduced support for user namespaces only for stateless pods. Kubernetes 1.28 lifted that restriction, and now, with Kubernetes 1.30, we are moving to beta! What is a user namespace? Note: Linux user namespaces are a different concept from Kubernetes namespaces. The former is a Linux kernel feature; the latter is a Kubernetes feature. User namespaces are a Linux feature that isolates the UIDs and GIDs of the containers from the ones on the host. The identifiers in the container can be mapped to identifiers on the host in a way where the host UID/GIDs used for different containers never overlap. Furthermore, the identifiers can be mapped to unprivileged, non-overlapping UIDs and GIDs on the host. This brings two key benefits: Prevention of lateral movement: As the UIDs and GIDs for different containers are mapped to different UIDs and GIDs on the host, containers have a harder time attacking each other, even if they escape the container boundaries. For example, suppose container A runs with different UIDs and GIDs on the host than container B. In that case, the operations it can do on container B's files and processes are limited: only read/write what a file allows to others, as it will never have permission owner or group permission (the UIDs/GIDs on the host are guaranteed to be different for different containers). Increased host isolation: As the UIDs and GIDs are mapped to unprivileged users on the host, if a container escapes the container boundaries, even if it runs as root inside the container, it has no privileges on the host. This greatly protects what host files it can read/write, which process it can send signals to, etc. Furthermore, capabilities granted are only valid inside the user namespace and not on the host, limiting the impact a container escape can have. User namespace IDs allocation Without using a user namespace, a container running as root in the case of a container breakout has root privileges on the node. If some capabilities were granted to the container, the capabilities are valid on the host too. None of this is true when using user namespaces (modulo bugs, of course ). Changes in 1.30 In Kubernetes 1.30, besides moving user namespaces to beta, the contributors working on this feature: Introduced a way for the kubelet to use custom ranges for the UIDs/GIDs mapping Have added a way for Kubernetes to enforce that the runtime supports all the features needed for user namespaces. If they are not supported, Kubernetes will show a clear error when trying to create a pod with user namespaces. Before 1.30, if the container runtime didn't support user namespaces, the pod could be created without a user namespace. Added more tests, including tests in the cri-tools repository. You can check the documentation on user namespaces for how to configure custom ranges for the mapping. Demo A few months ago, CVE-2024-21626 was disclosed. This vulnerability score is 8.6 (HIGH). It allows an attacker to escape a container and read/write to any path on the node and other pods hosted on the same node. Rodrigo created a demo that exploits CVE 2024-21626 and shows how the exploit, which works without user namespaces, is mitigated when user namespaces are in use. Please note that with user namespaces, an attacker can do on the host file system what the permission bits for "others" allow. Therefore, the CVE is not completely prevented, but the impact is greatly reduced. Node system requirements There are requirements on the Linux kernel version and the container runtime to use this feature. On Linux you need Linux 6.3 or greater. This is because the feature relies on a kernel feature named idmap mounts, and support for using idmap mounts with tmpfs was merged in Linux 6.3. Suppose you are using CRI-O with crun; as always, you can expect support for Kubernetes 1.30 with CRI-O 1.30. Please note you also need crun 1.9 or greater. If you are using CRI-O with runc, this is still not supported. Containerd support is currently targeted for containerd 2.0, and the same crun version requirements apply. If you are using containerd with runc, this is still not supported. Please note that containerd 1.7 added experimental support for user namespaces, as implemented in Kubernetes 1.25 and 1.26. We did a redesign in Kubernetes 1.27, which requires changes in the container runtime. Those changes are not present in containerd 1.7, so it only works with user namespaces support in Kubernetes 1.25 and 1.26. Another limitation of containerd 1.7 is that it needs to change the ownership of every file and directory inside the container image during Pod startup. This has a storage overhead and can significantly impact the container startup latency. Containerd 2.0 will probably include an implementation that will eliminate the added startup latency and storage overhead. Consider this if you plan to use containerd 1.7 with user namespaces in production. None of these containerd 1.7 limitations apply to CRI-O. How do I get involved? You can reach SIG Node by several means: Slack: #sig-node Mailing list Open Community Issues/PRs You can also contact us directly: GitHub: @rata @giuseppe @saschagrunert Slack: @rata @giuseppe @sascha View the full article
  9. Today, we are excited to announce that customers will now be able to use Apache Livy to submit their Apache Spark jobs to Amazon EMR on EKS, in addition to using StartJobRun API, Spark Operator, Spark Submit and Interactive Endpoints. With this launch, customers will be able to use a REST interface to easily submit Spark jobs or snippets of Spark code, retrieve results synchronously or asynchronously while continuing to get all of the Amazon EMR on EKS benefits such as EMR optimized Spark runtime, SSL secured Livy endpoint, programmatic set-up experience etc. View the full article
  10. This post was coauthored by Venkatesh Nannan, Sr. Engineering Manager at Rippling Introduction Rippling is a workforce management system that eliminates the friction of running a business, combining HR, IT, and Finance apps on a unified data platform. Rippling’s mission is to free up intelligent people to work on hard problems. Existing Stack Rippling uses a modular monolith architecture with different Docker entrypoints for multiple services and background jobs. These components are managed within a single, large, multi-tenant production cluster on Amazon Elastic Kubernetes Service (Amazon EKS per region), on a scale of over 1000 nodes. Rippling’s infra stack consists of: Karpenter for cluster autoscaling – a flexible, high-performance Kubernetes cluster autoscaler making sure of optimal compute capacity. Horizontal Pod Autoscaler for scaling Kubernetes pods based on demand. KEDA, an event-driven autoscaler for scaling background job processing containers based on event volume. IAM Roles for Service Accounts (IRSA) provide temporary AWS Identity and Access Management (IAM) credentials to the Kubernetes pod, enabling access to AWS resources such as Amazon Simple Storage Service (Amazon S3) buckets, etc. Argo CD, an open-source, GitOps continuous delivery tool, deploys applications and add-on software to the Kubernetes cluster. AWS Load Balancer Controller exposes Kubernetes services to end-users. TargetGroupBinding Custom Resource binds pods to Application Load Balancer (ALB) target groups. Amazon EKS managed node groups spanning across multiple Availability Zones (AZs). In addition to these technologies, we were using Cilium CNI for controlling network traffic between pods. However, we were running into challenges with this part of our stack, so we decided to look for the following alternatives. Figure 1: High level architecture of Rippling Challenges As Amazon EKS version 1.23 approached end-of-life, upgrading to v1.27 became imperative. However, during our initial attempts at upgrading to v1.24 in our non-production environment, we encountered a significant hurdle. New nodes running Cilium failed to join the cluster, increasing our downtime and requiring operational work on the CNI plugin. As a company, we prioritize using managed services to streamline operations and focus on adding value to our business. This Kubernetes upgrade task gave us an opportunity to look at alternatives that would be easier to maintain. We saw that AWS had just announced the VPC CNI support for k8s network policies using eBPF. We realized that migrating to this solution would enable us to replace our third-party networking add-on and solely rely on VPC CNI for both cluster networking and network policy implementation. This change would help reduce the overhead of managing operational software needed for cluster networking. Introduction of Amazon VPC CNI support for network policies When AWS announced VPC CNI support for k8s Network Policies using eBPF, we wanted to use the Amazon VPC CNI to secure the traffic in our Kubernetes clusters and simplify our EKS cluster management and operations. As network policy agents are bundled in existing VPC CNI pods, we would no longer need to run additional daemon pods and network policy controllers on the worker nodes. We followed the blue-green cluster upgrade strategy and were able to safely migrate the traffic from the old cluster to the new cluster with minimal risk of breaking existing workloads. Planning the migration We did an inventory of the applied network policies in our existing cluster and the various ingress/egress features used. This helped us identify deviations from upstream K8s Network Policies. This is necessary for migrating, as Amazon VPC CNI supports only the upstream k8s network policies as of this writing. Rippling was not using advanced features from our third-party network policy engines such as Global Network Policies, DNS based policy rules, or rule priority. Therefore, we did not need Custom Resource Definition (CRD) transformations going into the migration process. AWS recommends converting third-party NetworkPolicy CRDs to Kubernetes Network Policy resources and testing the converted policies in a separate test cluster before migrating from third-party to VPC CNI Network Policy engine in production. To assist in the migration process, AWS has developed a tool called K8s Network Policy Migrator that converts existing supported Calico/Cilium network policy CRDs to Kubernetes native network policies. After conversion you can directly test the converted network policies on your new clusters running VPC CNI network policy controller. The tool is designed to help streamline the migration process and make sure of a smooth transition. Picking migration strategy There are broadly two strategies to migrate the CNI plugin in the EKS cluster: (1) In-place and (2) Blue-Green. The in-Place strategy replaces an existing third-party CNI plugin with the VPC CNI plugin with network policy support in an existing EKS cluster. This would entail the following steps: Creating a new label “cni-plugin=3p” on the existing Amazon EKS managed node groups and Karpenter NodePool resources. Updating the existing third-party CNI DaemonSet to schedule CNI pods on those labeled nodes. Deploy the Amazon EKS Add-on version of Amazon VPC CNI and schedule them to nodes without the “cni-plugin” label. At this point the existing nodes have third-party CNI plugin pods and not the VPC CNI pods. Launch new Amazon EKS Managed node groups, Karpenter NodePool resources without the “cni-plugin=3p” label so that VPC CNI pods can be scheduled to those nodes. Drain and delete the existing Amazon EKS managed node groups and Karpenter NodePool resources to move the workloads to the new worker nodes with VPC CNI. Finally, delete the third-party CNI and associated network policy controllers from the cluster. As you can see, this process is involved, needs careful orchestration, and is more prone to errors that impact the application availability. The second approach is to use the Blue-Green strategy, in which a new EKS cluster is launched with the VPC CNI plugin and then the workloads are migrated to it. This approach is safer since it can be rolled back and provides the ability to test the setup in isolation before routing the live production traffic. Therefore, we chose the Blue-Green strategy for our migration. Migration As part of the blue-green strategy, we created a new EKS cluster with the Amazon VPC CNI and enabled Network Policy support by customizing the VPC CNI Amazon EKS add-on configuration. We also deployed the Argo CD agent on the cluster and bootstrapped it using Argo CD’s App of apps pattern to deploy the applications into the cluster. Network policies were also deployed to the cluster using the Argo CD. This was tested in a non-production environment to migrate from the third-party CNI to VPC CNI to make sure that applications and services passed functional tests. Then we could safely migrate the traffic from the old cluster to the new cluster without risks by leveraging the same strategy in the production environment. Lessons learned Amazon VPC CNI uses the VPC IP space to assign IP addresses to k8s pods. This led us to realize our existing VPCs were not properly sized to meet the growing number of k8s pods. We added a permitted secondary CIDR block 100.64.0.0/10 to the VPC and configured VPC CNI Custom Networking feature to assign those IP addresses to the k8s pods. This proactive measure makes sure of scalability as our infrastructure expands, mitigating concerns about IP address exhaustion. Leveraging automation and Infrastructure-as-Code (IaC) is recommended, especially as we are replicating existing clusters and migrating the workloads to them. Conclusion In this post, we discussed how Rippling migrated from third-party CNI to Amazon VPC CNI in their Amazon EKS clusters and enabled network policy support to secure pod-to-pod communications. Rippling used the blue-green strategy for the migration to minimize the application impact, and safely cut over the traffic to the new cluster. This migration helped Rippling to use the native features offered by AWS and reduced the burden of managing the operational software in our EKS clusters. Venkatesh Nannan, Sr. Engineering Manager – Infrastructure at Rippling Venkatesh Nannan is a seasoned Engineering leader with expertise in building scalable cloud-native applications, specializing in backend development and infrastructure architecture. View the full article
  11. Patent-search platform provider IPRally is growing quickly, servicing global enterprises, IP law firms, and multiple national patent and trademark offices. As the company grows, so do its technology needs. It continues to train its models for greater accuracy, adding 200,000 searchable records for customer access weekly, and mapping new patents. With millions of patent documents published annually – and the technical complexity of those documents increasing — it can take even the most seasoned patent professional several hours of research to resolve a case with traditional patent search tools. In 2018, Finnish firm IPRally set out to tackle this problem with a graph-based approach. “Search engines for patents were mostly complicated boolean ones, where you needed to spend hours building a complicated query,” says Juho Kallio, CTO and co-founder of the 50-person firm. “I wanted to build something important and challenging.” Using machine learning (ML) and natural language processing (NLP), the company has transformed the text from over 120 million global patent documents into document-level knowledge graphs embedded into a searchable vector space. Now, patent researchers can receive relevant results in seconds with AI-selected highlights of key information and explainable results. To meet those needs, IPRally built a customized ML platform using Google Kubernetes Engine (GKE) and Ray, an open-source ML framework, balancing efficiency, performance and streamlining machine learning operations (MLOps). The company uses open-source KubeRay to deploy and manage Ray on GKE, which enables them to leverage cost-efficient NVIDIA GPU Spot instances for exploratory ML research and development. It also uses Google Cloud data building blocks, including Cloud Storage and Compute Engine persistent disks. Next on the horizon is expanding to big data solutions with Ray Data and BigQuery. “Ray on GKE has the ability to support us in the future with any scale and any kind of distributed complex deep learning,” says Kallio. A custom ML platform built for performance and efficiency The IPRally engineering team’s primary focus is on R&D and how it can continue to improve its Graph AI to make technical knowledge more accessible. With just two DevOps engineers and one MLOps engineer, IPRally was able to build its own customized ML platform with GKE and Ray as key components. A big proponent of open source, IPRally transitioned everything to Kubernetes when their compute needs grew. However, they didn’t want to have to manage Kubernetes themselves. That led them to GKE, with its scalability, flexibility, open ecosystem, and its support for a diverse set of accelerators. All told, this provides IPRally the right balance of performance and cost, as well as easy management of compute resources and the ability to efficiently scale down capacity when they don’t need it. “GKE provides the scalability and performance we need for these complex training and serving needs and we get the right granularity of control over data and compute,” says Kallio. One particular GKE capability that Kallio highlights is container image streaming, which has significantly accelerated their start-up time. “We have seen that container image streaming in GKE has a significant impact on expediting our application startup time. Image streaming helps us accelerate our start-up time for a training job after submission by 20%,” he shares. “And, when we are able to reuse an existing pod, we can start up in a few seconds instead of minutes.” The next layer is Ray, which the company uses to scale the distributed, parallelized Python and Clojure applications it uses for machine learning. To more easily manage Ray, IPRally uses KubeRay, a specialized tool that simplifies Ray cluster management on Kubernetes. IPRally uses Ray for the most advanced tasks like massive preprocessing of data and exploratory deep learning in R&D. “Interoperability between Ray and GKE autoscaling is smooth and robust. We can combine computational resources without any constraints,” says Kallio. The heaviest ML loads are mainly deployed on G2 VMs featuring eight NVIDIA L4 GPUs featuring up to eight NVIDIA L4 Tensor Core GPUs, which deliver cutting-edge performance-per-dollar for AI inference workloads. And by leveraging them within GKE, IPRally facilitates the creation of nodes on-demand, scales GPU resources as needed, thus optimizing its operational costs. There is a single Terraform-provisioned Kubernetes cluster in each of the regions that IPRally searches for the inexpensive spot instances. GKE and Ray then step in for compute orchestration and automated scaling. To further ease MLOps, IPRally built its own thin orchestration layer, IPRay, atop KubeRay and Ray. This layer provides a command line tool for data scientists to easily provision a templated Ray cluster that scales efficiently up and down and that can run jobs in Ray without needing to know Terraform. This self-service layer reduces friction and allows both engineers and data scientists to focus on their higher-value work. Technology paves the way for strong growth Through this selection of Google Cloud and open-source frameworks, IPRally has shown that a startup can build an enterprise-grade ML platform without spending millions of dollars. Focusing on providing a powerful MLOps and automation foundation from its earliest days has paid dividends in efficiency and the team’s ability to focus on R&D. “Crafting a flexible ML infrastructure from the best parts has been more than worth it,” shares Jari Rosti, an ML engineer at IPRally. “Now, we’re seeing the benefits of that investment multiply as we adapt the infrastructure to the constantly evolving ideas of modern ML. That’s something other young companies can achieve as well by leveraging Google Cloud and Ray.” Further, the company has been saving 70% of ML R&D costs by using Spot instances. These affordable instances offer the same quality VMs as on-demand instances but are subject to interruption. But because IPRally’s R&D workloads are fault-tolerant, they are a good fit for Spot instances. IPRally closed a €10m A round investment last year, and it’s forging on with ingesting and processing IP documentation from around the globe, with a focus on improving its graph neural network models and building the best AI platform for patent searching. With 3.4 million patents filed in 2022, the third consecutive year of growth, data will keep flowing and IPRally can continue helping intellectual property professionals find every relevant bit of information. "With Ray on GKE, we've built an ML foundation that is a testament to how powerful Google Cloud is with AI," says Kallio. “And now, we’re prepared to explore far more advanced deep learning and to keep growing.” View the full article
  12. Azure Container Apps Azure container apps is a fully managed Kubernetes service that could be compared to ECS in AWS or Cloud Run in GCP. Compared to AKS, all integrations with Azure are already done for you. The best example is the use of managed identity where here you only need to enable a parameter whereas in AKS it’s complicated and changes every two years. View the full article
  13. GitOps represents a transformative approach to managing and deploying applications within Kubernetes environments, offering many benefits ranging from automation to enhanced collaboration. By centralizing operations around Git repositories, GitOps streamlines processes, fosters reliability, and nurtures teamwork. However, as teams embrace GitOps principles, the natural question arises: can these principles extend to managing databases? The answer is a resounding yes! Yet, while GitOps seamlessly aligns with stateless application management, applying it to stateful workloads, especially databases, presents distinct challenges. In this article, we’ll delve into the landscape of implementing GitOps for stateful applications and databases in Kubernetes, exploring five essential best practices to navigate this terrain effectively. Considerations for applying GitOps to stateful applications Versioning and managing stateful data When applying GitOps principles, it’s crucial to version control persistent data alongside application code. Tools like Git LFS (Large File Storage) can help you manage large datasets efficiently. Ensure that changes to stateful data are captured in Git commits and properly documented to maintain data integrity and facilitate reproducibility. Handling database schema changes and migrations Database schema changes and migrations require careful handling in GitOps workflows. Define database schema changes as code and store migration scripts in version-controlled repositories. Test and apply migrations consistently across environments with automated tools and continuous integration/continuous delivery processes. Backup and disaster recovery strategies Develop robust backup and disaster recovery strategies for stateful applications. Regularly back up data and configuration files to resilient storage solutions. Test data recovery and automate backup procedures to ensure preparedness for unforeseen events or data loss. You can also leverage GitOps practices to manage backup configurations and version-controlled recovery plans. Managing migrations like any other GitOps application Migration should follow suit as applications are deployed and managed using GitOps principles. This means defining migration tasks as declarative configurations stored in Git repositories alongside other application artifacts. These migration configurations should specify the desired state of the database schema or data transformation, including any dependencies or prerequisites. GitOps operators, such as the Atlas Operator for databases, can then pull these migration configurations from Git repositories and apply them to target databases. The operator ensures that the database’s actual state aligns with the desired state defined in the Git repository, automating the process of applying migrations and maintaining consistency across environments. Handling stateful application upgrades and rollbacks Planned and executed stateful application upgrades and rollbacks carefully. Define upgrade strategies that minimize downtime and data loss during the migration process to minimize downtime and data loss. Utilize GitOps principles to manage version-controlled manifests for application upgrades, ensuring consistency and reproducibility across environments. Implement automated rollback mechanisms to revert to previous application versions in case of failures or issues during upgrades. Best practices for implementing GitOps with stateful applications Infrastructure as Code (IaC) for provisioning storage resources Make your storage resources part of your Infrastructure as Code (IaC) practices. Define your storage configurations using tools like Terraform or Kubernetes manifests, and keep them version-controlled alongside your application code. This ensures consistency and reproducibility in your infrastructure deployments. Using Helm charts or operators Simplify the deployment and management of your stateful applications, including databases, by using Helm charts or Kubernetes operators. Helm helps package and template complex configurations while operators automate common operational tasks. Pick the best fit for your needs and keep your application management consistent. Implementing automated testing Automate your testing as much as possible for your stateful applications, including database changes. Develop thorough test suites to check everything from functionality to performance. Tools like Kubernetes Testing Framework (KTF) can help simulate production-like environments and catch issues early on. Continuous (CI/CD) pipelines Set up CI/CD pipelines tailored to your stateful applications, focusing on databases. Automate your build, test, and deployment processes to ensure smooth operation. Remember to trigger pipeline executions based on version-controlled changes so you have consistent deployments across different environments. Storing everything in Git By storing everything in Git repositories, teams benefit from version control, traceability, and collaboration. Every change made to configurations or migrations is tracked, providing a clear history of modifications and enabling easy rollback to previous states if necessary. Moreover, Git’s branching and merging capabilities facilitate collaborative development efforts, allowing multiple team members to work concurrently on different features or fixes without stepping on each other’s toes. Mastering GitOps for stateful workloads Applying GitOps principles to stateful applications, including databases, brings numerous benefits to development and operations teams. By storing everything in Git repositories, including data, migration scripts, and configurations, teams ensure version control, traceability, and collaboration. Handling database schema changes, migrations, and backups within GitOps workflows ensures consistency and reliability across environments. Moreover, managing migrations and upgrades as part of GitOps applications streamlines deployment processes and reduces the risk of errors. Implementing best practices such as Infrastructure as Code (IaC), leveraging Helm charts or operators, and implementing automated testing and CI/CD pipelines further enhances the efficiency and reliability of managing stateful applications. By adopting GitOps for stateful applications, organizations can achieve greater agility, scalability, and resilience in their software delivery processes. With a solid foundation of GitOps principles and best practices in place, teams can confidently navigate the complexities of managing stateful applications in Kubernetes environments, enabling them to focus on efficiently delivering value to their customers. The post 5 Best practices for implementing GitOps for stateful applications and databases in Kubernetes appeared first on Amazic. View the full article
  14. Starting today, customers can receive granular cost visibility for Amazon Elastic Kubernetes Service (Amazon EKS) in the AWS Cost and Usage Reports (CUR), enabling you to analyze, optimize, and chargeback cost and usage for your Kubernetes applications. With AWS Split Cost Allocation Data for Amazon EKS, customers can now allocate application costs to individual business units and teams based on how Kubernetes applications consume shared EC2 CPU and memory resources. View the full article
  15. Editors: Amit Dsouza, Frederick Kautz, Kristin Martin, Abigail McCarthy, Natali Vlatko Announcing the release of Kubernetes v1.30: Uwubernetes, the cutest release! Similar to previous releases, the release of Kubernetes v1.30 introduces new stable, beta, and alpha features. The consistent delivery of top-notch releases underscores the strength of our development cycle and the vibrant support from our community. This release consists of 45 enhancements. Of those enhancements, 17 have graduated to Stable, 18 are entering Beta, and 10 have graduated to Alpha. Release theme and logo Kubernetes v1.30: Uwubernetes Kubernetes v1.30 makes your clusters cuter! Kubernetes is built and released by thousands of people from all over the world and all walks of life. Most contributors are not being paid to do this; we build it for fun, to solve a problem, to learn something, or for the simple love of the community. Many of us found our homes, our friends, and our careers here. The Release Team is honored to be a part of the continued growth of Kubernetes. For the people who built it, for the people who release it, and for the furries who keep all of our clusters online, we present to you Kubernetes v1.30: Uwubernetes, the cutest release to date. The name is a portmanteau of “kubernetes” and “UwU,” an emoticon used to indicate happiness or cuteness. We’ve found joy here, but we’ve also brought joy from our outside lives that helps to make this community as weird and wonderful and welcoming as it is. We’re so happy to share our work with you. UwU Improvements that graduated to stable in Kubernetes v1.30 This is a selection of some of the improvements that are now stable following the v1.30 release. Robust VolumeManager reconstruction after kubelet restart (SIG Storage) This is a volume manager refactoring that allows the kubelet to populate additional information about how existing volumes are mounted during the kubelet startup. In general, this makes volume cleanup after kubelet restart or machine reboot more robust. This does not bring any changes for user or cluster administrators. We used the feature process and feature gate NewVolumeManagerReconstruction to be able to fall back to the previous behavior in case something goes wrong. Now that the feature is stable, the feature gate is locked and cannot be disabled. Prevent unauthorized volume mode conversion during volume restore (SIG Storage) For Kubernetes 1.30, the control plane always prevents unauthorized changes to volume modes when restoring a snapshot into a PersistentVolume. As a cluster administrator, you'll need to grant permissions to the appropriate identity principals (for example: ServiceAccounts representing a storage integration) if you need to allow that kind of change at restore time. Warning: Action required before upgrading. The prevent-volume-mode-conversion feature flag is enabled by default in the external-provisioner v4.0.0 and external-snapshotter v7.0.0. Volume mode change will be rejected when creating a PVC from a VolumeSnapshot unless you perform the steps described in the the "Urgent Upgrade Notes" sections for the external-provisioner 4.0.0 and the external-snapshotter v7.0.0. For more information on this feature also read converting the volume mode of a Snapshot. Pod Scheduling Readiness (SIG Scheduling) Pod scheduling readiness graduates to stable this release, after being promoted to beta in Kubernetes v1.27. This now-stable feature lets Kubernetes avoid trying to schedule a Pod that has been defined, when the cluster doesn't yet have the resources provisioned to allow actually binding that Pod to a node. That's not the only use case; the custom control on whether a Pod can be allowed to schedule also lets you implement quota mechanisms, security controls, and more. Crucially, marking these Pods as exempt from scheduling cuts the work that the scheduler would otherwise do, churning through Pods that can't or won't schedule onto the nodes your cluster currently has. If you have cluster autoscaling active, using scheduling gates doesn't just cut the load on the scheduler, it can also save money. Without scheduling gates, the autoscaler might otherwise launch a node that doesn't need to be started. In Kubernetes v1.30, by specifying (or removing) a Pod's .spec.schedulingGates, you can control when a Pod is ready to be considered for scheduling. This is a stable feature and is now formally part of the Kubernetes API definition for Pod. Min domains in PodTopologySpread (SIG Scheduling) The minDomains parameter for PodTopologySpread constraints graduates to stable this release, which allows you to define the minimum number of domains. This feature is designed to be used with Cluster Autoscaler. If you previously attempted use and there weren't enough domains already present, Pods would be marked as unschedulable. The Cluster Autoscaler would then provision node(s) in new domain(s), and you'd eventually get Pods spreading over enough domains. Go workspaces for k/k (SIG Architecture) The Kubernetes repo now uses Go workspaces. This should not impact end users at all, but does have a impact for developers of downstream projects. Switching to workspaces caused some breaking changes in the flags to the various k8s.io/code-generator tools. Downstream consumers should look at staging/src/k8s.io/code-generator/kube_codegen.sh to see the changes. For full details on the changes and reasons why Go workspaces was introduced, read Using Go workspaces in Kubernetes. Improvements that graduated to beta in Kubernetes v1.30 This is a selection of some of the improvements that are now beta following the v1.30 release. Node log query (SIG Windows) To help with debugging issues on nodes, Kubernetes v1.27 introduced a feature that allows fetching logs of services running on the node. To use the feature, ensure that the NodeLogQuery feature gate is enabled for that node, and that the kubelet configuration options enableSystemLogHandler and enableSystemLogQuery are both set to true. Following the v1.30 release, this is now beta (you still need to enable the feature to use it, though). On Linux the assumption is that service logs are available via journald. On Windows the assumption is that service logs are available in the application log provider. Logs are also available by reading files within /var/log/ (Linux) or C:\var\log\ (Windows). For more information, see the log query documentation. CRD validation ratcheting (SIG API Machinery) You need to enable the CRDValidationRatcheting feature gate to use this behavior, which then applies to all CustomResourceDefinitions in your cluster. Provided you enabled the feature gate, Kubernetes implements validation racheting for CustomResourceDefinitions. The API server is willing to accept updates to resources that are not valid after the update, provided that each part of the resource that failed to validate was not changed by the update operation. In other words, any invalid part of the resource that remains invalid must have already been wrong. You cannot use this mechanism to update a valid resource so that it becomes invalid. This feature allows authors of CRDs to confidently add new validations to the OpenAPIV3 schema under certain conditions. Users can update to the new schema safely without bumping the version of the object or breaking workflows. Contextual logging (SIG Instrumentation) Contextual Logging advances to beta in this release, empowering developers and operators to inject customizable, correlatable contextual details like service names and transaction IDs into logs through WithValues and WithName. This enhancement simplifies the correlation and analysis of log data across distributed systems, significantly improving the efficiency of troubleshooting efforts. By offering a clearer insight into the workings of your Kubernetes environments, Contextual Logging ensures that operational challenges are more manageable, marking a notable step forward in Kubernetes observability. Make Kubernetes aware of the LoadBalancer behaviour (SIG Network) The LoadBalancerIPMode feature gate is now beta and is now enabled by default. This feature allows you to set the .status.loadBalancer.ingress.ipMode for a Service with type set to LoadBalancer. The .status.loadBalancer.ingress.ipMode specifies how the load-balancer IP behaves. It may be specified only when the .status.loadBalancer.ingress.ip field is also specified. See more details about specifying IPMode of load balancer status. New alpha features Speed up recursive SELinux label change (SIG Storage) From the v1.27 release, Kubernetes already included an optimization that sets SELinux labels on the contents of volumes, using only constant time. Kubernetes achieves that speed up using a mount option. The slower legacy behavior requires the container runtime to recursively walk through the whole volumes and apply SELinux labelling individually to each file and directory; this is especially noticable for volumes with large amount of files and directories. Kubernetes 1.27 graduated this feature as beta, but limited it to ReadWriteOncePod volumes. The corresponding feature gate is SELinuxMountReadWriteOncePod. It's still enabled by default and remains beta in 1.30. Kubernetes 1.30 extends support for SELinux mount option to all volumes as alpha, with a separate feature gate: SELinuxMount. This feature gate introduces a behavioral change when multiple Pods with different SELinux labels share the same volume. See KEP for details. We strongly encourage users that run Kubernetes with SELinux enabled to test this feature and provide any feedback on the KEP issue. Feature gate Stage in v1.30 Behavior change SELinuxMountReadWriteOncePod Beta No SELinuxMount Alpha Yes Both feature gates SELinuxMountReadWriteOncePod and SELinuxMount must be enabled to test this feature on all volumes. This feature has no effect on Windows nodes or on Linux nodes without SELinux support. Recursive Read-only (RRO) mounts (SIG Node) Introducing Recursive Read-Only (RRO) Mounts in alpha this release, you'll find a new layer of security for your data. This feature lets you set volumes and their submounts as read-only, preventing accidental modifications. Imagine deploying a critical application where data integrity is key—RRO Mounts ensure that your data stays untouched, reinforcing your cluster's security with an extra safeguard. This is especially crucial in tightly controlled environments, where even the slightest change can have significant implications. Job success/completion policy (SIG Apps) From Kubernetes v1.30, indexed Jobs support .spec.successPolicy to define when a Job can be declared succeeded based on succeeded Pods. This allows you to define two types of criteria: succeededIndexes indicates that the Job can be declared succeeded when these indexes succeeded, even if other indexes failed. succeededCount indicates that the Job can be declared succeeded when the number of succeeded Indexes reaches this criterion. After the Job meets the success policy, the Job controller terminates the lingering Pods. Traffic distribution for services (SIG Network) Kubernetes v1.30 introduces the spec.trafficDistribution field within a Kubernetes Service as alpha. This allows you to express preferences for how traffic should be routed to Service endpoints. While traffic policies focus on strict semantic guarantees, traffic distribution allows you to express preferences (such as routing to topologically closer endpoints). This can help optimize for performance, cost, or reliability. You can use this field by enabling the ServiceTrafficDistribution feature gate for your cluster and all of its nodes. In Kubernetes v1.30, the following field value is supported: PreferClose: Indicates a preference for routing traffic to endpoints that are topologically proximate to the client. The interpretation of "topologically proximate" may vary across implementations and could encompass endpoints within the same node, rack, zone, or even region. Setting this value gives implementations permission to make different tradeoffs, for example optimizing for proximity rather than equal distribution of load. You should not set this value if such tradeoffs are not acceptable. If the field is not set, the implementation (like kube-proxy) will apply its default routing strategy. See Traffic Distribution for more details. Graduations, deprecations and removals for Kubernetes v1.30 Graduated to stable This lists all the features that graduated to stable (also known as general availability). For a full list of updates including new features and graduations from alpha to beta, see the release notes. This release includes a total of 17 enhancements promoted to Stable: Container Resource based Pod Autoscaling Remove transient node predicates from KCCM's service controller Go workspaces for k/k Reduction of Secret-based Service Account Tokens CEL for Admission Control CEL-based admission webhook match conditions Pod Scheduling Readiness Min domains in PodTopologySpread Prevent unauthorised volume mode conversion during volume restore API Server Tracing Cloud Dual-Stack --node-ip Handling AppArmor support Robust VolumeManager reconstruction after kubelet restart kubectl delete: Add interactive(-i) flag Metric cardinality enforcement Field status.hostIPs added for Pod Aggregated Discovery Deprecations and removals Removed the SecurityContextDeny admission plugin, deprecated since v1.27 (SIG Auth, SIG Security, and SIG Testing) With the removal of the SecurityContextDeny admission plugin, the Pod Security Admission plugin, available since v1.25, is recommended instead. Release notes Check out the full details of the Kubernetes 1.30 release in our release notes. Availability Kubernetes 1.30 is available for download on GitHub. To get started with Kubernetes, check out these interactive tutorials or run local Kubernetes clusters using minikube. You can also easily install 1.30 using kubeadm. Release team Kubernetes is only possible with the support, commitment, and hard work of its community. Each release team is made up of dedicated community volunteers who work together to build the many pieces that make up the Kubernetes releases you rely on. This requires the specialized skills of people from all corners of our community, from the code itself to its documentation and project management. We would like to thank the entire release team for the hours spent hard at work to deliver the Kubernetes v1.30 release to our community. The Release Team's membership ranges from first-time shadows to returning team leads with experience forged over several release cycles. A very special thanks goes out our release lead, Kat Cosgrove, for supporting us through a successful release cycle, advocating for us, making sure that we could all contribute in the best way possible, and challenging us to improve the release process. Project velocity The CNCF K8s DevStats project aggregates a number of interesting data points related to the velocity of Kubernetes and various sub-projects. This includes everything from individual contributions to the number of companies that are contributing and is an illustration of the depth and breadth of effort that goes into evolving this ecosystem. In the v1.30 release cycle, which ran for 14 weeks (January 8 to April 17), we saw contributions from 863 companies and 1391 individuals. Event update KubeCon + CloudNativeCon China 2024 will take place in Hong Kong, from 21 – 23 August 2024! You can find more information about the conference and registration on the event site. KubeCon + CloudNativeCon North America 2024 will take place in Salt Lake City, Utah, The United States of America, from 12 – 15 November 2024! You can find more information about the conference and registration on the eventsite. Upcoming release webinar Join members of the Kubernetes v1.30 release team on Thursday, May 23rd, 2024, at 9 A.M. PT to learn about the major features of this release, as well as deprecations and removals to help plan for upgrades. For more information and registration, visit the event page on the CNCF Online Programs site. Join members of the Kubernetes v1.30 release team on DATE AND TIME TBA to learn about the major features of this release, as well as deprecations and removals to help plan for upgrades. For more information and registration, visit the event page on the CNCF Online Programs site. Get involved The simplest way to get involved with Kubernetes is by joining one of the many Special Interest Groups (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and through the channels below. Thank you for your continued feedback and support. Follow us on 𝕏 @Kubernetesio for latest updates Join the community discussion on Discuss Join the community on Slack Post questions (or answer questions) on Stack Overflow Share your Kubernetes story Read more about what’s happening with Kubernetes on the blog Learn more about the Kubernetes Release Team View the full article
  16. Google Kubernetes Engine (GKE) has emerged as a leading container orchestration platform for deploying and managing containerized applications. But GKE is not just limited to running stateless microservices — its flexible design supports workloads that need to persist data as well, via tight integrations with persistent disk or blob storage products including Persistent Disk, Cloud Storage, and FIlestore. And for organizations that need even stronger throughput and performance, there’s Hyperdisk, Google Cloud's next-generation block storage service that allows you to modify your capacity, throughput, and IOPS-related performance and tailor it to your workloads, without having to re-deploy your entire stack. Today, we’re excited to introduce support for Hyperdisk Balanced storage volumes on GKE, which joins the Hyperdisk Extreme and Hyperdisk Throughput options, and is a good fit for workloads that typically rely on persistent SSDs — for example, line-of-business applications, web applications, and databases. Hyperdisk Balanced is supported on 3rd+ generation instance types. For instance type compatibility please reference this page. Understanding Hyperdisk First, let’s start with what it means to fine-tune your throughput with Hyperdisk? What about tuning IOPS and capacity? Tuning IOPS means refining the input/output operations per second (IOPS) performance of a storage device. Hyperdisk allows you to provision only the IOPS you need, and does not share it with other volumes on the same node. Tuning throughput means enhancing the amount of data or information that can be transferred or processed in a specified amount of time. Hyperdisk allows you to specify exactly how much throughput a given storage volume should have without limitations imposed by other volumes on the same node. Expanding capacity means you can increase the size of your storage volume. Hyperdisk can be provisioned for the exact capacity you need and extended as your storage needs grow. These Hyperdisk capabilities translate in to the following benefits: First, you can transform your environment's stateful environment through flexibility, ease of use, scalability and management — with a potential cost savings to boost. Imagine the benefit of a storage area network environment without the management overhead. Second, you can build lower-cost infrastructure by rightsizing the machine types that back your GKE nodes, optimizing your GKE stack while integrating with GKE CI/CD, security and networking capabilities. Finally, you get predictability — the consistency that comes with fine-grained tuning for each individual node and its required IOPS. You can also use this to fine-tune for ML model building/training/deploying, as Hyperdisk removes the element of throughput and IOPS from all PDs sharing the same node bandwidth, placing it on the provisioned Hyperdisk beneath it. Compared with traditional persistent disk, Hyperdisk’s storage performance is decoupled from the node your application is running on. This gives you more flexibility with your IOPs and throughput, while reducing the possibility that overall storage performance would be impacted by a noisy neighbor. On GKE, the following types of Hyperdisk volumes are available: Hyperdisk Balanced - General-purpose volume type that is the best fit for most workloads, with up to 2.4GBps of throughput and 160k IOPS. Ideal for line-of-business applications, web applications, databases, or boot disks. Hyperdisk Throughput - Optimized for cost-efficient high-throughput workloads, with up to 3 GBps throughput (>=128 KB IO size). Hyperdisk Throughput is targeted at scale-out analytics (e.g., Kafka, Hadoop, Redpanda) and other throughput-oriented cost-sensitive workloads, and is supported on GKE Autopilot and GKE Standard clusters. Hyperdisk Extreme - Specifically optimized for IOPS performance, such as large-scale databases that require high IOPS performance. Supported on Standard clusters only. Getting started with Hyperdisk on GKE To provision Hyperdisk you first need to make sure your cluster has the necessary StorageClass loaded that references the disk. You can add the necessary IOPS/Throughput to the storage class or go with the defaults, which are 3,600 IOPs and 140MiBps (Docs). YAML to Apply to GKE cluster code_block <ListValue: [StructValue([('code', 'apiVersion: storage.k8s.io/v1\r\nkind: StorageClass\r\nmetadata:\r\nname: balanced-storage\r\nprovisioner: pd.csi.storage.gke.io\r\nvolumeBindingMode: WaitForFirstConsumer\r\nallowVolumeExpansion: true\r\nparameters:\r\ntype: hyperdisk-balanced\r\nprovisioned-throughput-on-create: "200Mi" #optional\r\nprovisioned-iops-on-create: "5000" #optional'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3dfdda67a5b0>)])]> After you’ve configured the StorageClass, you can create a persistent volume claim that references it. code_block <ListValue: [StructValue([('code', 'kind: PersistentVolumeClaim\r\napiVersion: v1\r\nmetadata:\r\nname: postgres\r\nspec:\r\naccessModes:\r\n- ReadWriteOnce\r\nstorageClassName: balanced-storage\r\nresources:\r\nrequests:\r\nstorage: 1000Gi'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3dfdda67ad60>)])]> Take control of your GKE storage with Hyperdisk To recap, Hyperdisk lets you take control of your cloud infrastructure storage, throughput, IOPS, and capacity, combining the speed of existing PD SSD volumes with the flexibility to fine-tune to your specific needs, by decoupling disks from the node that your workload is running on. For more, check out the following sessions from Next ‘24: Next generation storage: Designing storage for the future A primer on data on Kubernetes And be sure to check out these resources: How to create a Hyperdisk on GKE Data management solutions on GKE Contact your Google Cloud representative View the full article
  17. Artificial Intelligence (AI) and Kubernetes are pillars of modern technology, each contributing significantly to innovation and efficiency. With AI adoption skyrocketing across industries, the demand for robust infrastructure to support AI workloads has surged. According to a recent report by Gartner, global spending on AI is projected to reach $297 billion by 2027 from $124 billion in 2022, with businesses increasingly investing in AI-driven solutions to gain a competitive edge. Concurrently, Kubernetes has emerged as the de facto standard for container orchestration, witnessing a remarkable growth trajectory. CNCF published the results of its latest microsurvey report on cloud-native FinOps and cloud financial management (CFM). Kubernetes has driven cloud spending up for 49% of respondents, while 28% stated their costs remain unchanged and 24% saved after migrating to Kubernetes. This intersection of AI and Kubernetes signifies a paradigm shift in technology, empowering organizations to harness the power of AI at scale while leveraging Kubernetes’ agility and scalability for seamless deployment and management. This post will examine Kubernetes’s role in managing end-to-end AI pipelines, including developing, training, and deploying AI models at each procedure phase. We’ll discuss how Kubernetes facilitates the creation of efficient, repeatable workflows by data scientists and machine learning engineers, increasing output and accelerating innovation in the AI space. Understanding Kubernetes Before discussing Kubernetes’s importance in AI, let’s look at its definition and how it works. Kubernetes, often known as K8s, is an open-source container orchestration tool first developed by Google. It simplifies and automates containerized applications’ scaling, deployment, and management. Containers allow for the lightweight, portable packaging and deployment of applications along with their dependencies and customizations. Kubernetes nullifies the infrastructure concerns with a platform for delivering and managing containerized workloads. Empowering Innovation: The Synergy Between AI and Kubernetes Artificial intelligence (AI) and Kubernetes work hand in hand at the forefront of modern technology. Artificial intelligence (AI) is transforming industries through intelligent automation and decision-making. Meanwhile, Kubernetes provides the dependable infrastructure for deploying, scaling, and managing AI applications. Kubernetes facilitates the seamless coordination of AI workloads across several environments, optimizing resource usage and guaranteeing dependability. In return, AI leverages the scalability and agility of Kubernetes to provide innovative solutions that increase productivity and encourage business growth. Kubernetes and AI work together to produce a dynamic synergy that allows businesses to take full advantage of AI technologies in the rapidly evolving digital ecosystem. End-to-End AI Pipelines AI pipelines are the complex and interconnected procedures utilized in creating, training, and applying AI models. These pipelines typically include data preprocessing, model training, assessment, tuning, and deployment. Effective management of these pipelines at all levels requires automation and coordination. Kubernetes provides the infrastructure needed to orchestrate end-to-end AI pipelines with ease. Let us discuss how Kubernetes facilitates AI model development, training, and deployment. Development Phase During the development stage of an AI project, data scientists and machine learning engineers experiment with various algorithms, datasets, and model architectures to construct and enhance AI models. Setting up development environments is made easier by Kubernetes, which isolates every stage of the AI pipeline behind containers. Developers can define Kubernetes manifests, representing application components’ desired state, such as networking configurations, volumes, and containers. Subsequently, Kubernetes automatically schedules and starts these containers across the cluster, ensuring consistent and repeatable development environments. Training Phase After the model design is finished, the model is trained on large datasets. Training deep learning models sometimes requires a lot of processing power, such as GPUs or TPUs, for speedier processing. Two of Kubernetes’ advantages are its ability to independently scale resources in response to demand and distribute computational jobs throughout the cluster. Data scientists can use Kubernetes’ horizontal scaling characteristics to train many models in parallel and save significant training time. Furthermore, Kubernetes makes resource limits and quotas easier to implement, ensuring fair resource allocation and preventing conflicts between multiple teams or projects. Evaluation and Tuning After training, AI models must be evaluated using validation datasets to determine their performance. By integrating with tools like Kubeflow and TensorFlow Extended (TFX), Kubernetes enables hyperparameter tweaking and automatic model evaluation. These frameworks provide prebuilt components for creating and managing AI pipelines on Kubernetes clusters. Data scientists may speed up the iterative process of improving model performance by developing workflows that automate model review, model selection, and hyperparameter tuning. Deployment Phase Once a model is sufficiently accurate, it must be applied to predict new data in real-world scenarios. The Kubernetes platform facilitates the deployment of AI models by eliminating infrastructure-related issues and providing tools for container orchestration and service discovery. Data scientists can bundle learned models into container images using platforms like Docker or Kubernetes’ built-in support for custom resources like Custom Resource Definitions (CRDs). When these containerized models are deployed as microservices, they may be accessed via RESTful APIs or gRPC endpoints. Scaling and Monitoring In industrial applications, AI models could face varying workloads and demand levels. Thanks to Kubernetes ‘ auto-scaling functionality, resources can be dynamically changed based on real-time data, such as CPU usage, RAM consumption, or other application-specific indicators. This ensures optimal effectiveness and efficient use of resources, particularly during periods of high demand. Furthermore, Kubernetes provides information about the health and functionality of AI applications with easy integration with logging and monitoring tools such as Prometheus and Grafana. Data scientists can set up alerts and dashboards to monitor key indicators and respond quickly to any anomalies or issues. Reproducibility and Portability One of Kubernetes’ key advantages for AI pipelines is its reproducibility and portability. Kubernetes manifests are used to declaratively specify the desired state of the application, including dependencies, configurations, and environment variables. These manifests can be version-managed using Git or other version control systems, which promotes collaboration and repeatability in various settings. Furthermore, Kubernetes abstracts away the underlying infrastructure, simplifying the installation of AI pipelines on any cloud provider or internal data center with minimal to no adjustments. In conclusion, Kubernetes is essential to coordinating end-to-end AI pipelines, which include creating, training, and implementing AI models at every stage of the process. By automating container orchestration and abstracting away infrastructure complexities, Kubernetes frees data scientists and machine learning engineers to concentrate on innovation rather than infrastructure maintenance. Organizations may use Kubernetes to provide AI-powered apps that match the demands of today’s changing business landscape, increase productivity, and accelerate the speed of AI innovation. The post Kubernetes & its Role in AI: Orchestrating End-to-End AI Pipelines appeared first on Amazic. View the full article
  18. Kubernetes has transformed container Orchestration, providing an effective framework for delivering and managing applications at scale. However, efficient storage management is essential to guarantee the dependability, security, and efficiency of your Kubernetes clusters. Benefits like data loss prevention, regulations compliance, and maintaining operational continuity mitigating threats underscore the importance of security and dependability. This post will examine the best practices for the top 10 Kubernetes storage, emphasizing encryption, access control, and safeguarding storage components. Kubernetes Storage Kubernetes storage is essential to contemporary cloud-native setups because it makes data persistence in containerized apps more effective. It provides a dependable and scalable storage resource management system that guarantees data permanence through migrations and restarts of containers. Among other capabilities, persistent Volumes (PVs) and Persistent Volume Claims (PVCs) give Kubernetes a versatile abstraction layer for managing storage. By providing dynamic provisioning of storage volumes catered to particular workload requirements, storage classes further improve flexibility. Organizations can build and manage stateful applications with agility, scalability, and resilience in various computing settings by utilizing Kubernetes storage capabilities. 1. Data Encryption Sensitive information kept in Kubernetes clusters must be protected with data encryption. Use encryption tools like Kubernetes Secrets to safely store sensitive information like SSH keys, API tokens, and passwords. Encryption both in transit and at rest is also used to further protect data while it is being stored and transmitted between nodes. 2. Use Secrets Management Tools Steer clear of hardcoding private information straight into Kubernetes manifests. Instead, use powerful secrets management solutions like Vault or Kubernetes Secrets to securely maintain and distribute secrets throughout your cluster. This guarantees that private information is encrypted and available only to approved users and applications. 3. Implement Role-Based Access Control (RBAC) RBAC allows you to enforce fine-grained access controls on your Kubernetes clusters. Define roles and permissions to limit access to storage resources using the least privilege concept. This lowers the possibility of data breaches and unauthorized access by preventing unauthorized users or apps from accessing or changing crucial storage components. 4. Secure Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) Ensure that claims and persistent volumes are adequately secured to avoid tampering or unwanted access. Put security rules in place to limit access to particular namespaces or users and turn on encryption for information on persistent volumes. PVs and PVCs should have regular audits and monitoring performed to identify and address any security flaws or unwanted entry attempts. 5. Enable Network Policies To manage network traffic between pods and storage resources, use Kubernetes network policies. To guarantee that only authorized pods and services may access storage volumes and endpoints, define firewall rules restricting communication to and from storage components. This reduces the possibility of data exfiltration and network-based assaults and prevents unauthorized network access. 6. Enable Role-Based Volume Provisioning Utilize Kubernetes’ dynamic volume provisioning features to automate storage volume creation and management. To limit users’ ability to build or delete volumes based on their assigned roles and permissions, utilize role-based volume provisioning. This guarantees the effective and safe allocation of storage resources and helps prevent resource abuse. 7. Utilize Pod Security Policies To specify and implement security restrictions on pods’ access to storage resources, implement pod security policies. To manage pod rights, host resource access, and storage volume interactions, specify security policies. By implementing stringent security measures, you can reduce the possibility of privilege escalation, container escapes, and illegal access to storage components. 8. Regularly Update and Patch Kubernetes Components Monitor security flaws by regularly patching and updating Kubernetes components, including storage drivers and plugins. Keep your storage infrastructure safe from new attacks and vulnerabilities by subscribing to security advisories and adhering to best practices for Kubernetes cluster management. 9. Monitor and Audit Storage Activity To keep tabs on storage activity in your Kubernetes clusters, put extensive logging, monitoring, and auditing procedures in place. To proactively identify security incidents or anomalies, monitor access logs, events, and metrics on storage components. Utilize centralized logging and monitoring systems to see what’s happening with storage in your cluster. 10. Conduct Regular Security Audits and Penetration Testing Conduct comprehensive security audits and penetration tests regularly to evaluate the security posture of your Kubernetes storage system. Find and fix any security holes, incorrect setups, and deployment flaws in your storage system before hackers can exploit them. Work with security professionals and use automated security technologies to thoroughly audit your Kubernetes clusters. Considerations Before putting suggestions for Kubernetes storage into practice, take into account the following: Evaluate Security Requirements: Match storage options with compliance and corporate security requirements. Assess Performance Impact: Recognize the potential effects that resource usage and application performance may have from access controls, encryption, and security rules. Identify Roles and Responsibilities: Clearly define who is responsible for what when it comes to managing storage components in Kubernetes clusters. Plan for Scalability: Recognize the need for scalability and the possible maintenance costs related to implementing security measures. Make Monitoring and upgrades a Priority: To ensure that security measures continue to be effective over time, place a strong emphasis on continual monitoring, audits, and upgrades. Effective storage management is critical for ensuring the security, reliability, and performance of Kubernetes clusters. By following these ten best practices for Kubernetes storage, including encryption, access control, and securing storage components, you can strengthen the security posture of your Kubernetes environment and mitigate the risk of data breaches, unauthorized access, and other security threats. Stay proactive in implementing security measures and remain vigilant against emerging threats to safeguard your Kubernetes storage infrastructure effectively. The post Mastering Kubernetes Storage: 10 Best Practices for Security and Efficiency appeared first on Amazic. View the full article
  19. We are excited to announce that Amazon EMR on EKS simplified the authentication and authorization user experience by integrating with Amazon EKS's improved cluster access management controls. With this launch, Amazon EMR on EKS will use EKS access management controls to automatically obtain the necessary permissions to run Amazon EMR applications on the EKS cluster. View the full article
  20. Introduction Snapchat is an app that hundreds of millions of people around the world use to communicate with their close friends. The app is powered by microservice architectures deployed in Amazon Elastic Kubernetes Service (Amazon EKS) and datastores such as Amazon CloudFront, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and Amazon ElastiCache. This post explains how Snap builds its microservices, leveraging Amazon EKS with AWS Identity and Access Management. It also discusses how Snap protects its K8s resources with intelligent threat detection offered by GuardDuty that is augmented with Falco and in-house tooling to secure Snap’s cloud-native service mesh platform. The following figure (Figure 1) shows the main data flow when users send and receive snaps. To send a snap, the mobile app calls Snap API Gateway that routes the call to a Media service that persists the sender message media in S3 (Steps 1-3). Next, the Friend microservice validates the sender’s permission to Snap the recipient user by querying the messaging core service (MCS) that checks whether the recipient is a friend (Steps 4-6), and the conversation is stored in Snap DB powered by DynamoDB (Steps 7-8). To receive a snap, the mobile app calls MCS to get the message metadata such as the pointer to the media file. It calls the Media microservice to load the media file from the content system that persists user data, powered by CloudFront and Amazon S3 (Steps 9-11) Figure 1: Snaps end-to-end user data flow main data flow when users send and receive Snaps The original Snap service mesh design included single tenant microservice per EKS cluster. Snap discovered, however, that managing thousands of clusters added an operational burden as microservices grew. Additionally, they discovered that many environments were underused and unnecessarily consuming AWS account resources such as IAM roles and policies. This required enabling microservices to share clusters and redefining tenant isolation to meet security requirements. Finally, Snap wanted to limit access to microservice data and storage while keeping them centralized in a network account meshed with Google’s cloud. The following figure illustrates a Kubernetes-based microservice, Friends and Users, deployed in Amazon EKS or Google Kubernetes Engine (GKE). Snap users reach Snap’s API Gateway through Envoy. Switchboard, Snap’s mesh service configuration panel, updates Edge Envoy endpoints with available micro-services resources after deploying them. Figure 2 – Snap’s high mesh design Bootstrap The purpose of this stage is the preparation and implementation of a secure multi-cloud compute provisioning system. Snap uses Kubernetes clusters as a construct that defines an environment that hosts one or more microservices, such as Friends, and Users in the first figure. Snap’s security bootstrap includes three layers that include authentication, authorization, and admission control when designing a Kubernetes-based multi-cloud. Snap uses IAM roles for Kubernetes service accounts (IRSA) to provision fine-grained service identities for microservices running in shared EKS clusters that allow access to AWS services such as Amazon S3, DynamoDB, etc. For operator access scoped to the K8s namespace, Snap built a tool to manage K8s RBAC that maps K8s roles to IAM, allowing developers to perform service operations following the principle of least privileges. Beyond RBAC and IRSA, Snap wanted to impose deployment validations such as making sure containers are instantiated by approved image registries (such as Amazon Elastic Container Registry (Amazon ECR) or images built and signed by approved CI systems) as well as preventing containers from running with elevated permissions. To accomplish this, Snap built admission controller webhooks. Build-time Snap believes in empowering its engineers to architect microservices autonomously within K8s constructs. Snap’s goal was to maximize Amazon EKS security benefits while abstracting K8s semantics. Specifically, 1/ The security of the Cloud safeguarding the infrastructure that runs Amazon EKS, object-stores (Amazon S3), data-stores (KeyDB, ElastiCache, and DynamoDB), and the network that interconnects them. The security in the Cloud includes 2/ protecting the K8s cluster API server and etcd from malicious access, and finally, 3/ protecting Snap’s applications’ RBAC, network policies, data encryption, and containers. Switchboard – Snap’s mesh service configuration panel Snap built a configuration hub called Switchboard to provide a single control panel across AWS and GCP to create K8s clusters. Microservices owners can define environments in regions with specific compute types offered by cloud providers. Switchboard also enables service owners to follow approval workflows to establish trust for service-to-service authentication and specify public routes and other service configurations. It allows service owners to manage service dependencies, traffic routes between K8s clusters. Switchboard presents a simplified configuration model based on the environments. It manages metadata, such as the service owner, team email, and on-call paging information. Snap wanted to empower tenants to control access to microservice data objects (images, audio, and video) and metadata stores such as databases and cache stores so they deployed the data stores in separate data accounts controlled by IAM roles and policies in those accounts. Snap needed to centralize microservices network paths to mesh with GCP resources. Therefore, Snap deployed the microservices in a centralized account using IAM roles for service accounts that assume roles the tenants’ data AWS accounts. The following figure shows how multiple environments (Kubernetes cluster) host three tenants using two different IRSA. Friends’ Service A can read and write to a dedicated DynamoDB table deployed in a separate AWS account. Similarly, MCS’ Service B can get and cache sessions or friends in ElastiCache. Figure 2 – Snap’s high mesh design One of Snap’s design principles was to maximize autonomy while maintaining their desired level of isolation between environments, all while minimizing operational overhead. Snap chose Kubernetes service accounts as the minimal isolation level. Amazon EKS support for IRSA allowed Snap to leverage OIDC to simplify the process of granting IAM permissions to application pods. Snap also uses RBAC to limit access to K8s cluster resources and secure cluster users’ authentication. Snap considers adopting Amazon EKS Pod Identities to reuse associations when running the same application in multiple clusters. This is done by applying identical associations to each cluster without modifying the role trust policy. Deployment-time Cluster access by human operators AWS IAM users and roles are currently managed by Snap which generates policies based on business requirements. Operators use Switchboard to request access to their microservice. Switchboard map an IAM user to a cluster RBAC policy that grants access to Kuberentes objects. Snap is evaluating AWS Identity Center to allow Switchboard federate AWS Single Sign-On (SSO) with a central identity provider (IdP) for enabling cluster operators to have least-privilege access using cluster RBAC policies, enforced through AWS IAM. Isolation strategy Snap chose to isolate K8s cluster resources by namespaces to achieve isolation by limiting container permissions with IAM roles for Service Accounts, and CNI network policies. In addition, Snap provision separate pod identity for add-ons such as CNI, Cluster-AutoScaler, and FluentD. Add-ons uses separate IAM policies using IRSA and not overly permissive EC2 instance IAM roles. Network partitioning Snap’s mesh defines rules that restrict or permit network traffic between microservices pods with Amazon VPC CNI network policies. Snap wanted to minimize IP exhaustion caused by IPv4 address space limitations due to its massive scale. Instead of working around IPv4 limitations using Kubernetes IPv4/IPv6 dual-stack, Snap wanted to migrate gracefully to IPv6. Snap can connect IPv4-based Amazon EKS clusters to IPv6 clusters using Amazon EKS IPv6 support and Amazon VPC CNI. Container hardening Snap built admission controller webhook to audit and enforce pod security context to prevent containers from running with elevated permissions (RunAs) or accessing volumes at the cluster or namespace level. Snap validate that workloads don’t use configurations that break container isolation such as hostIPC, HostNetwork, and HostPort. Figure 4 – Snap’s admission controller service Network policies Kubernetes Network Policies enable you to define and enforce rules for traffic flow between pods. Policies act as a virtual firewall, which allows you to segment and secure your cluster by specifying network traffic rules for pods, namespaces, IP addresses, and ports. Amazon EKS extends and simplifies native support for network policies in Amazon VPC CNI and Amazon EC2, security groups, and network access control lists (NACLs) through the upstream Kubernetes Network Policy API. Run-time Audit logs Snap needs auditing system activities to enhance compliance, intrusion detection, and policy validation. This is to track unauthorized access, policy violations, suspicious activities, and incident responses. Snap uses Amazon EKS control plane logging that ingests API server, audit, authenticator, controller manager, and scheduler logs into CloudWatch. It also uses Amazon CloudTrail for cross-AWS services access and fluentd to ingest application logging to CloudWatch and Google’s operations suite. Runtime security monitoring Snap has begun using GuardDuty EKS Protection. This helps Snap monitor EKS cluster control plane activity by analyzing Amazon EKS audit logs to identify unauthorized and malicious access patterns. This functionality, combined with their admission controller events provides coverage of cluster changes. For runtime monitoring, Snap uses the open source Falco agent to monitor EKS workloads in the Snap service mesh. GuardDuty findings are contextualized by Falco rules based on container running processes. This context helps to identify cluster tenants with whom to triage the findings. Falco agents support Snap’s runtime monitoring goals and deliver consistent reporting. Snap compliments GuardDuty with Falco to ensure changes are not made to a running container by monitoring and analyzing container syscalls (container drift detection rule). Conclusion Snap’s cloud infrastructure has evolved from running a monolith inside Google App Engine to microservices deployed in Kubernetes across AWS and GCP. This streamlined architecture helped improve Snapchat’s reliability. Snap’s Kuberentes multi-tenant vision needed abstraction of cloud provider security semantics such as AWS security features to comply with strict security and privacy standards. This blog reviewed the methods and systems used to implement a secure compute and data platform on Amazon EKS and Amazon data-stores. This included bootstrapping, building, deploying, and running Snap’s workloads. Snap is not stopping here. Learn more about Snap and our collaboration with Snap. View the full article
  21. Today, we are announcing the general availability of provider-defined functions in the AWS, Google Cloud, and Kubernetes providers in conjunction with the HashiCorp Terraform 1.8 launch. This release represents yet another step forward in our unique approach to ecosystem extensibility. Provider-defined functions will allow anyone in the Terraform community to build custom functions within providers and extend the capabilities of Terraform. Introducing provider-defined functions Previously, users relied on a handful of built-in functions in the Terraform configuration language to perform a variety of tasks, including numeric calculations, string manipulations, collection transformations, validations, and other operations. However, the Terraform community needed more capabilities than the built-in functions could offer. With the release of Terraform 1.8, providers can implement custom functions that you can call from the Terraform configuration. The schema for a function is defined within the provider's schema using the Terraform provider plugin framework. To use a function, declare the provider as a required_provider in the terraform{} block: terraform { required_version = ">= 1.8.0" required_providers { time = { source = "hashicorp/local" version = "2.5.1" } } }Provider-defined functions can perform multiple tasks, including: Transforming existing data Parsing combined data into individual, referenceable components Building combined data from individual components Simplifying validations and assertions To access a provider-defined function, reference the provider:: namespace with the local name of the Terraform Provider. For example, you can use the direxists function by including provider::local::direxists() in your Terraform configuration. Below you’ll find several examples of new provider-defined functions in the officially supported AWS, Google Cloud, and Kubernetes providers. Terraform AWS provider The 5.40 release of the Terraform AWS provider includes its first provider-defined functions to parse and build Amazon Resource Names (ARNs), simplifying Terraform configurations where ARN manipulation is required. The arn_parse provider-defined function is used to parse an ARN and return an object of individual referenceable components, such as a region or account identifier. For example, to get the AWS account ID from an Amazon Elastic Container Registry (ECR) repository, use the arn_parse function to retrieve the account ID and set it as an output: # create an ECR repository resource "aws_ecr_repository" "hashicups" { name = "hashicups" image_scanning_configuration { scan_on_push = true } } # output the account ID of the ECR repository output "hashicups_ecr_repository_account_id" { value = provider::aws::arn_parse(aws_ecr_repository.hashicups.arn).account_id } Running terraform apply against the above configuration outputs the AWS Account ID: Apply complete! Resources: 2 added, 0 changed, 0 destroyed. Outputs: hashicups_ecr_repository_account_id = "751192555662" Without the arn_parse function, you would need to define and test a combination of built-in Terraform functions to split the ARN and reference the proper index or define a regular expression to match on a substring. The function handles the parsing for you in a concise manner so that you do not have to worry about doing this yourself. The AWS provider also includes a new arn_build function that builds an ARN from individual attributes and returns it as a string. This provider-defined function can create an ARN that you cannot reference from another resource. For example, you may want to allow another account to pull images from your ECR repository. The arn_build function below constructs an ARN for an IAM policy using an account ID: # allow another account to pull from the ECR repository data "aws_iam_policy_document" "cross_account_pull_ecr" { statement { sid = "AllowCrossAccountPull" effect = "Allow" principals { type = "AWS" identifiers = [ provider::aws::arn_build("aws", "iam", "", var.cross_account_id, "root"), ] } actions = [ "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer", ] } }The arn_build function helps to guide and simplify the process of combining substrings to form an ARN, and it improves readability compared to using string interpolation. Without it, you'd have to look up the exact ARN structure in the AWS documentation and manually test it. Terraform Google Cloud provider The 5.23 release of the Terraform Google Cloud provider adds a simplified way to get regions, zones, names, and projects from the IDs of resources that aren’t managed by your Terraform configuration. Provider-defined functions can now help parse Google IDs when adding an IAM binding to a resource that’s managed outside of Terraform: resource "google_cloud_run_service_iam_member" "example_run_invoker_jane" { member = "user:jane@example.com" role = "run.invoker" service = provider::google::name_from_id(var.example_cloud_run_service_id) location = provider::google::location_from_id(var.example_cloud_run_service_id) project = provider::google::project_from_id(var.example_cloud_run_service_id) }The Google Cloud provider also includes a new region_from_zone provider-defined function that helps obtain region names from a given zone (e.g. “us-west1” from “us-west1-a”). This simple string processing could be achieved in multiple ways using Terraform’s built-in functions previously, but the new function simplifies the process: locals { zone = “us-central1-a” # ways to derive the region “us-central1” using built-in functions region_1 = join("-", slice(split("-", local.zone), 0, 2)) region_2 = substr(local.zone, 0, length(local.zone)-2) # our new region_from_zone function makes this easier! region_3 = provider::google::region_from_zone(local.zone) }Terraform Kubernetes provider The 2.28 release of the Terraform Kubernetes provider includes provider-defined functions for encoding and decoding Kubernetes manifests into Terraform, making it easier for practitioners to work with the kubernetes_manifest resource. Users that have a Kubernetes manifest in YAML format can use the manifest_decode function to convert it into a Terraform object. The example below shows how to use the manifest_decode function by referring to a Kubernetes manifest in YAML format embedded in the Terraform configuration: locals { manifest = <If you prefer to decode a YAML file instead of using an embedded YAML format, you can do so by combining the built-in file function with the manifest_decode function. $ cat manifest.yaml --- kind: Namespace apiVersion: v1 metadata: name: test labels: name: testresource "kubernetes_manifest" "example" { manifest = provider::kubernetes::manifest_decode(file("${path.module}/manifest.yaml")) }If your manifest YAML contains multiple Kubernetes resources, you may use the manifestdecodemulti function to decode them into a list which can then be used with the for_each attribute on the kubernetes_manifest resource: $ cat manifest.yaml --- kind: Namespace apiVersion: v1 metadata: name: test-1 labels: name: test-1 --- kind: Namespace apiVersion: v1 metadata: name: test-2 labels: name: test-2 resource "kubernetes_manifest" "example" { for_each = { for m in provider::kubernetes::manifest_decode_multi(file("${path.module}/manifest.yaml"))): m.metadata.name => m } manifest = each.value }Getting started with provider-defined functions Provider-defined functions allow Terraform configurations to become more expressive and readable by declaring practitioner intent and reducing complex, repetitive expressions. To learn about all of the new launch-day provider-defined functions, please review the documentation and changelogs of the aforementioned providers: Terraform AWS provider Terraform Google provider Terraform Kubernetes provider Review our Terraform Plugin Framework documentation to learn more about how provider-defined functions work and how you can make your own. We are thankful to our partners and community members for their valuable contributions to the HashiCorp Terraform ecosystem. View the full article
  22. As a data scientist or machine learning engineer, you’re constantly challenged with building accurate models and deploying and scaling them effectively. The demand for AI-driven solutions is skyrocketing, and mastering the art of scaling machine learning (ML) applications has become more critical than ever. This is where Kubernetes emerges as a game-changer, often abbreviated as K8s. In this blog, we’ll see how you can leverage Kubernetes to scale machine learning applications. Understanding Kubernetes for ML applications Kubernetes or K8s provides a framework for automating the deployment and management of containerized applications. Its architecture revolves around clusters composed of physical or virtual machine nodes. Within these clusters, Kubernetes manages containers via Pods, the most minor deployable units that can hold one or more containers. One significant advantage of Kubernetes for machine learning applications is its ability to handle dynamic workloads efficiently. With features like auto-scaling, load balancing, and service discovery, Kubernetes ensures that your ML models can scale to meet varying demands. Understanding TensorFlow The open-source framework TensorFlow, developed by Google, is used to build and train machine learning models. TensorFlow integrates with Kubernetes, allowing you to deploy and manage TensorFlow models at scale. Deploying TensorFlow on Kubernetes involves containerizing your TensorFlow application and defining Kubernetes resources such as Deployments and Services. By utilizing Kubernetes features like horizontal pod autoscaling, you can automatically scale the number of TensorFlow serving instances based on the incoming request traffic, ensuring optimal performance under varying workloads. Exploring PyTorch Facebook’s PyTorch, developed by Facebook, is popular among researchers and developers because of its dynamic computational graph and easy-to-use API. Like TensorFlow, PyTorch can be deployed on Kubernetes clusters, offering flexibility and ease of use for building and deploying deep learning models. Deploying PyTorch models on Kubernetes involves packaging your PyTorch application into containers and defining Kubernetes resources to manage deployment. While PyTorch may have a slightly different workflow than TensorFlow, it offers similar scalability benefits when deployed on Kubernetes. Best practices for scaling ML applications on Kubernetes You can deploy TensorFlow on Kubernetes using various methods, such as StatefulSets and DaemonSets. Together, TensorFlow and Kubernetes provide a powerful platform for building and deploying large-scale machine learning applications. With Kubernetes handling infrastructure management and TensorFlow offering advanced machine learning capabilities, you can efficiently scale your ML applications to meet the demands of modern businesses. Follow these best practices for scaling ML applications: Containerization of ML models: Begin by containerizing your ML models using Docker. This process involves encapsulating your model, its dependencies, and any necessary preprocessing or post-processing steps into a Docker container. This ensures that your ML model can run consistently across different environments. Utilize Kubernetes operators: Kubernetes Operators are custom controllers that extend Kubernetes’ functionality to automate complex tasks. Leveraging Operators specific to TensorFlow or PyTorch can streamline the deployment and management of ML workloads on Kubernetes. These Operators handle scaling, monitoring, and automatic update rollout, reducing operational overhead. Horizontal Pod Autoscaling (HPA): You can implement HPA to adjust the number of replicas based on CPU or memory usage. This allows your ML application to scale up or down in response to changes in workload, ensuring optimal performance and resource utilization. Resource requests and limits: You can effectively manage resource allocation by defining requests and limits for your Kubernetes pods. Resource requests specify the amount of CPU and memory required by each pod, while limits prevent pods from exceeding a certain threshold. Tuning these parameters ensures that your ML application receives sufficient resources without impacting other workloads running on the cluster. Distributed training and inference: Consider distributed training and inference techniques to distribute computation across multiple nodes for large-scale ML workloads. Kubernetes facilitates the orchestration of distributed training jobs by coordinating the execution of tasks across pods. The APIs in TensorFlow and PyTorch enable the effective use of cluster resources. Model versioning and rollbacks: Implement versioning mechanisms for your ML models to enable easy rollback in case of issues with new releases. Kubernetes’ declarative approach to configuration management lets you define desired state configurations for your ML deployments. By versioning these configurations and leveraging features like Kubernetes’ Deployment Rollback, you can quickly revert to a previous model version if necessary. Monitoring and logging: Monitoring and logging solutions give you insights into the performance of your ML applications. Monitoring metrics such as request latency, error rates, and resource utilization help you identify bottlenecks and optimize performance. Security and compliance: Ensure that your ML deployments on Kubernetes adhere to security best practices and compliance requirements. Implement security measures such as pod security policies and role-based access control (RBAC) to control access and protect sensitive data. Regularly update dependencies and container images to patch vulnerabilities and mitigate security risks. Scaling ML applications on Kubernetes Deploying machine learning applications on Kubernetes offers a scalable and efficient solution for managing complex workloads in production environments. By following best practices such as containerization, leveraging Kubernetes Operators, implementing autoscaling, and optimizing resource utilization, organizations can harness the full potential of frameworks like TensorFlow or PyTorch to scale their ML applications effectively. Integrating Kubernetes with distributed training techniques enables efficient utilization of cluster resources while versioning mechanisms and monitoring solutions ensure reliability and performance. By embracing these best practices, organizations can deploy resilient, scalable, and high-performance ML applications that meet the demands of modern business environments. The post Tensorflow or PyTorch + K8s = ML apps at scale appeared first on Amazic. View the full article
  23. Multi-cluster Ingress (MCI) is an advanced feature typically used in cloud computing environments that enables the management of ingress (the entry point for external traffic into a network) across multiple Kubernetes clusters. This functionality is especially useful for applications that are deployed globally across several regions or clusters, offering a unified method to manage access to these applications. MCI simplifies the process of routing external traffic to the appropriate cluster, enhancing both the reliability and scalability of applications. Here are key features and benefits of Multi-cluster Ingress: Global Load Balancing: MCI can intelligently route traffic to different clusters based on factors like region, latency, and health of the service. This ensures users are directed to the nearest or best-performing cluster, improving the overall user experience. Centralized Management: It allows for the configuration of ingress rules from a single point, even though these rules are applied across multiple clusters. This simplification reduces the complexity of managing global applications. High Availability and Redundancy: By spreading resources across multiple clusters, MCI enhances the availability and fault tolerance of applications. If one cluster goes down, traffic can be automatically rerouted to another healthy cluster. Cross-Region Failover: In the event of a regional outage or a significant drop in performance within one cluster, MCI can perform automatic failover to another cluster in a different region, ensuring continuous availability of services. Cost Efficiency: MCI helps optimize resource utilization across clusters, potentially leading to cost savings. Traffic can be routed to clusters where resources are less expensive or more abundant. Simplified DNS Management: Typically, MCI solutions offer integrated DNS management, automatically updating DNS records based on the health and location of clusters. This removes the need for manual DNS management in a multi-cluster setup. How What is Multi-cluster Ingress (MCI) works? Multi-cluster Ingress (MCI) works by managing and routing external traffic into applications running across multiple Kubernetes clusters. This process involves several components and steps to ensure that traffic is efficiently and securely routed to the appropriate destination based on predefined rules and policies. Here’s a high-level overview of how MCI operates: 1. Deployment Across Multiple Clusters Clusters Preparation: You deploy your application across multiple Kubernetes clusters, often spread across different geographical locations or cloud regions, to ensure high availability and resilience. Ingress Configuration: Each cluster has its own set of resources and services that need to be exposed externally. With MCI, you configure ingress resources that are aware of the multi-cluster environment. 2. Central Management and Configuration Unified Ingress Control: A central control plane is used to manage the ingress resources across all participating clusters. This is where you define the rules for how external traffic should be routed to your services. DNS and Global Load Balancer Setup: MCI integrates with global load balancers and DNS systems to direct users to the closest or most appropriate cluster based on various criteria like location, latency, and the health of the clusters. 3. Traffic Routing Initial Request: When a user makes a request to access the application, the DNS resolution directs the request to the global load balancer. Global Load Balancing: The global load balancer evaluates the request against the configured routing rules and the current state of the clusters (e.g., load, health). It then selects the optimal cluster to handle the request. Cluster Selection: The criteria for cluster selection can include geographic proximity to the user, the health and capacity of the clusters, and other custom rules defined in the MCI configuration. Request Forwarding: Once the optimal cluster is selected, the global load balancer forwards the request to an ingress controller in that cluster. Service Routing: The ingress controller within the chosen cluster then routes the request to the appropriate service based on the path, host, or other headers in the request. 4. Health Checks and Failover Continuous Monitoring: MCI continuously monitors the health and performance of all clusters and their services. This includes performing health checks and monitoring metrics to ensure each service is functioning correctly. Failover and Redundancy: In case a cluster becomes unhealthy or is unable to handle additional traffic, MCI automatically reroutes traffic to another healthy cluster, ensuring uninterrupted access to the application. 5. Scalability and Maintenance Dynamic Scaling: As traffic patterns change or as clusters are added or removed, MCI dynamically adjusts routing rules and load balancing to optimize performance and resource utilization. Configuration Updates: Changes to the application or its deployment across clusters can be managed centrally through the MCI configuration, simplifying updates and maintenance. Example Deployment YAML for Multi-cluster Ingress with FrontendConfig and BackendConfig This example includes: A simple web application deployment. A service to expose the application within the cluster. A MultiClusterService to expose the service across clusters. A MultiClusterIngress to expose the service externally with FrontendConfig and BackendConfig. The post What is Multi-cluster Ingress (MCI) appeared first on DevOpsSchool.com. View the full article
  24. Author: Andrei Kvapil (Ænix) Approaching the most interesting phase, this article delves into running Kubernetes within Kubernetes. Technologies such as Kamaji and Cluster API are highlighted, along with their integration with KubeVirt. Previous discussions have covered preparing Kubernetes on bare metal and how to turn Kubernetes into virtual machines management system. This article concludes the series by explaining how, using all of the above, you can build a full-fledged managed Kubernetes and run virtual Kubernetes clusters with just a click. First up, let's dive into the Cluster API. Cluster API Cluster API is an extension for Kubernetes that allows the management of Kubernetes clusters as custom resources within another Kubernetes cluster. The main goal of the Cluster API is to provide a unified interface for describing the basic entities of a Kubernetes cluster and managing their lifecycle. This enables the automation of processes for creating, updating, and deleting clusters, simplifying scaling, and infrastructure management. Within the context of Cluster API, there are two terms: management cluster and tenant clusters. Management cluster is a Kubernetes cluster used to deploy and manage other clusters. This cluster contains all the necessary Cluster API components and is responsible for describing, creating, and updating tenant clusters. It is often used just for this purpose. Tenant clusters are the user clusters or clusters deployed using the Cluster API. They are created by describing the relevant resources in the management cluster. They are then used for deploying applications and services by end-users. It's important to understand that physically, tenant clusters do not necessarily have to run on the same infrastructure with the management cluster; more often, they are running elsewhere. A diagram showing interaction of management Kubernetes cluster and tenant Kubernetes clusters using Cluster API For its operation, Cluster API utilizes the concept of providers which are separate controllers responsible for specific components of the cluster being created. Within Cluster API, there are several types of providers. The major ones are: Infrastructure Provider, which is responsible for providing the computing infrastructure, such as virtual machines or physical servers. Control Plane Provider, which provides the Kubernetes control plane, namely the components kube-apiserver, kube-scheduler, and kube-controller-manager. Bootstrap Provider, which is used for generating cloud-init configuration for the virtual machines and servers being created. To get started, you will need to install the Cluster API itself and one provider of each type. You can find a complete list of supported providers in the project's documentation. For installation, you can use the clusterctl utility, or Cluster API Operator as the more declarative method. Choosing providers Infrastructure provider To run Kubernetes clusters using KubeVirt, the KubeVirt Infrastructure Provider must be installed. It enables the deployment of virtual machines for worker nodes in the same management cluster, where the Cluster API operates. Control plane provider The Kamaji project offers a ready solution for running the Kubernetes control plane for tenant clusters as containers within the management cluster. This approach has several significant advantages: Cost-effectiveness: Running the control plane in containers avoids the use of separate control plane nodes for each cluster, thereby significantly reducing infrastructure costs. Stability: Simplifying architecture by eliminating complex multi-layered deployment schemes. Instead of sequentially launching a virtual machine and then installing etcd and Kubernetes components inside it, there's a simple control plane that is deployed and run as a regular application inside Kubernetes and managed by an operator. Security: The cluster's control plane is hidden from the end user, reducing the possibility of its components being compromised, and also eliminates user access to the cluster's certificate store. This approach to organizing a control plane invisible to the user is often used by cloud providers. Bootstrap provider Kubeadm as the Bootstrap Provider - as the standard method for preparing clusters in Cluster API. This provider is developed as part of the Cluster API itself. It requires only a prepared system image with kubelet and kubeadm installed and allows generating configs in the cloud-init and ignition formats. It's worth noting that Talos Linux also supports provisioning via the Cluster API and has providers for this. Although previous articles discussed using Talos Linux to set up a management cluster on bare-metal nodes, to provision tenant clusters the Kamaji+Kubeadm approach has more advantages. It facilitates the deployment of Kubernetes control planes in containers, thus removing the need for separate virtual machines for control plane instances. This simplifies the management and reduces costs. How it works The primary object in Cluster API is the Cluster resource, which acts as the parent for all the others. Typically, this resource references two others: a resource describing the control plane and a resource describing the infrastructure, each managed by a separate provider. Unlike the Cluster, these two resources are not standardized, and their kind depends on the specific provider you are using: A diagram showing the relationship of a Cluster resource and the resources it links to in Cluster API Within Cluster API, there is also a resource named MachineDeployment, which describes a group of nodes, whether they are physical servers or virtual machines. This resource functions similarly to standard Kubernetes resources such as Deployment, ReplicaSet, and Pod, providing a mechanism for the declarative description of a group of nodes and automatic scaling. In other words, the MachineDeployment resource allows you to declaratively describe nodes for your cluster, automating their creation, deletion, and updating according to specified parameters and the requested number of replicas. A diagram showing the relationship of a MachineDeployment resource and its children in Cluster API To create machines, MachineDeployment refers to a template for generating the machine itself and a template for generating its cloud-init config: A diagram showing the relationship of a MachineDeployment resource and the resources it links to in Cluster API To deploy a new Kubernetes cluster using Cluster API, you will need to prepare the following set of resources: A general Cluster resource A KamajiControlPlane resource, responsible for the control plane operated by Kamaji A KubevirtCluster resource, describing the cluster configuration in KubeVirt A KubevirtMachineTemplate resource, responsible for the virtual machine template A KubeadmConfigTemplate resource, responsible for generating tokens and cloud-init At least one MachineDeployment to create some workers Polishing the cluster In most cases, this is sufficient, but depending on the providers used, you may need other resources as well. You can find examples of the resources created for each type of provider in the Kamaji project documentation. At this stage, you already have a ready tenant Kubernetes cluster, but so far, it contains nothing but API workers and a few core plugins that are standardly included in the installation of any Kubernetes cluster: kube-proxy and CoreDNS. For full integration, you will need to install several more components: To install additional components, you can use a separate Cluster API Add-on Provider for Helm, or the same FluxCD discussed in previous articles. When creating resources in FluxCD, it's possible to specify the target cluster by referring to the kubeconfig generated by Cluster API. Then, the installation will be performed directly into it. Thus, FluxCD becomes a universal tool for managing resources both in the management cluster and in the user tenant clusters. A diagram showing the interaction scheme of fluxcd, which can install components in both management and tenant Kubernetes clusters What components are being discussed here? Generally, the set includes the following: CNI Plugin To ensure communication between pods in a tenant Kubernetes cluster, it's necessary to deploy a CNI plugin. This plugin creates a virtual network that allows pods to interact with each other and is traditionally deployed as a Daemonset on the cluster's worker nodes. You can choose and install any CNI plugin that you find suitable. A diagram showing a CNI plugin installed inside the tenant Kubernetes cluster on a scheme of nested Kubernetes clusters Cloud Controller Manager The main task of the Cloud Controller Manager (CCM) is to integrate Kubernetes with the cloud infrastructure provider's environment (in your case, it is the management Kubernetes cluster in which all worksers of tenant Kubernetes are provisioned). Here are some tasks it performs: When a service of type LoadBalancer is created, the CCM initiates the process of creating a cloud load balancer, which directs traffic to your Kubernetes cluster. If a node is removed from the cloud infrastructure, the CCM ensures its removal from your cluster as well, maintaining the cluster's current state. When using the CCM, nodes are added to the cluster with a special taint, node.cloudprovider.kubernetes.io/uninitialized, which allows for the processing of additional business logic if necessary. After successful initialization, this taint is removed from the node. Depending on the cloud provider, the CCM can operate both inside and outside the tenant cluster. The KubeVirt Cloud Provider is designed to be installed in the external parent management cluster. Thus, creating services of type LoadBalancer in the tenant cluster initiates the creation of LoadBalancer services in the parent cluster, which direct traffic into the tenant cluster. A diagram showing a Cloud Controller Manager installed outside of a tenant Kubernetes cluster on a scheme of nested Kubernetes clusters and the mapping of services it manages from the parent to the child Kubernetes cluster CSI Driver The Container Storage Interface (CSI) is divided into two main parts for interacting with storage in Kubernetes: csi-controller: This component is responsible for interacting with the cloud provider's API to create, delete, attach, detach, and resize volumes. csi-node: This component runs on each node and facilitates the mounting of volumes to pods as requested by kubelet. In the context of using the KubeVirt CSI Driver, a unique opportunity arises. Since virtual machines in KubeVirt runs within the management Kubernetes cluster, where a full-fledged Kubernetes API is available, this opens the path for running the csi-controller outside of the user's tenant cluster. This approach is popular in the KubeVirt community and offers several key advantages: Security: This method hides the internal cloud API from the end-user, providing access to resources exclusively through the Kubernetes interface. Thus, it reduces the risk of direct access to the management cluster from user clusters. Simplicity and Convenience: Users don't need to manage additional controllers in their clusters, simplifying the architecture and reducing the management burden. However, the CSI-node must necessarily run inside the tenant cluster, as it directly interacts with kubelet on each node. This component is responsible for the mounting and unmounting of volumes into pods, requiring close integration with processes occurring directly on the cluster nodes. The KubeVirt CSI Driver acts as a proxy for ordering volumes. When a PVC is created inside the tenant cluster, a PVC is created in the management cluster, and then the created PV is connected to the virtual machine. A diagram showing a CSI plugin components installed on both inside and outside of a tenant Kubernetes cluster on a scheme of nested Kubernetes clusters and the mapping of persistent volumes it manages from the parent to the child Kubernetes cluster Cluster Autoscaler The Cluster Autoscaler is a versatile component that can work with various cloud APIs, and its integration with Cluster-API is just one of the available functions. For proper configuration, it requires access to two clusters: the tenant cluster, to track pods and determine the need for adding new nodes, and the managing Kubernetes cluster (management kubernetes cluster), where it interacts with the MachineDeployment resource and adjusts the number of replicas. Although Cluster Autoscaler usually runs inside the tenant Kubernetes cluster, in this situation, it is suggested to install it outside for the same reasons described before. This approach is simpler to maintain and more secure as it prevents users of tenant clusters from accessing the management API of the management cluster. A diagram showing a Cluster Autoscaler installed outside of a tenant Kubernetes cluster on a scheme of nested Kubernetes clusters Konnectivity There's another additional component I'd like to mention - Konnectivity. You will likely need it later on to get webhooks and the API aggregation layer working in your tenant Kubernetes cluster. This topic is covered in detail in one of my previous article. Unlike the components presented above, Kamaji allows you to easily enable Konnectivity and manage it as one of the core components of your tenant cluster, alongside kube-proxy and CoreDNS. Conclusion Now you have a fully functional Kubernetes cluster with the capability for dynamic scaling, automatic provisioning of volumes, and load balancers. Going forward, you might consider metrics and logs collection from your tenant clusters, but that goes beyond the scope of this article. Of course, all the components necessary for deploying a Kubernetes cluster can be packaged into a single Helm chart and deployed as a unified application. This is precisely how we organize the deployment of managed Kubernetes clusters with the click of a button on our open PaaS platform, Cozystack, where you can try all the technologies described in the article for free. View the full article
  25. Author: Andrei Kvapil (Ænix) Continuing our series of posts on how to build your own cloud using just the Kubernetes ecosystem. In the previous article, we explained how we prepare a basic Kubernetes distribution based on Talos Linux and Flux CD. In this article, we'll show you a few various virtualization technologies in Kubernetes and prepare everything need to run virtual machines in Kubernetes, primarily storage and networking. We will talk about technologies such as KubeVirt, LINSTOR, and Kube-OVN. But first, let's explain what virtual machines are needed for, and why can't you just use docker containers for building cloud? The reason is that containers do not provide a sufficient level of isolation. Although the situation improves year by year, we often encounter vulnerabilities that allow escaping the container sandbox and elevating privileges in the system. On the other hand, Kubernetes was not originally designed to be a multi-tenant system, meaning the basic usage pattern involves creating a separate Kubernetes cluster for every independent project and development team. Virtual machines are the primary means of isolating tenants from each other in a cloud environment. In virtual machines, users can execute code and programs with administrative privilege, but this doesn't affect other tenants or the environment itself. In other words, virtual machines allow to achieve hard multi-tenancy isolation, and run in environments where tenants do not trust each other. Virtualization technologies in Kubernetes There are several different technologies that bring virtualization into the Kubernetes world: KubeVirt and Kata Containers are the most popular ones. But you should know that they work differently. Kata Containers implements the CRI (Container Runtime Interface) and provides an additional level of isolation for standard containers by running them in virtual machines. But they work in a same single Kubernetes-cluster. A diagram showing how container isolation is ensured by running containers in virtual machines with Kata Containers KubeVirt allows running traditional virtual machines using the Kubernetes API. KubeVirt virtual machines are run as regular linux processes in containers. In other words, in KubeVirt, a container is used as a sandbox for running virtual machine (QEMU) processes. This can be clearly seen in the figure below, by looking at how live migration of virtual machines is implemented in KubeVirt. When migration is needed, the virtual machine moves from one container to another. A diagram showing live migration of a virtual machine from one container to another in KubeVirt There is also an alternative project - Virtink, which implements lightweight virtualization using Cloud-Hypervisor and is initially focused on running virtual Kubernetes clusters using the Cluster API. Considering our goals, we decided to use KubeVirt as the most popular project in this area. Besides we have extensive expertise and already made a lot of contributions to KubeVirt. KubeVirt is easy to install and allows you to run virtual machines out-of-the-box using containerDisk feature - this allows you to store and distribute VM images directly as OCI images from container image registry. Virtual machines with containerDisk are well suited for creating Kubernetes worker nodes and other VMs that do not require state persistence. For managing persistent data, KubeVirt offers a separate tool, Containerized Data Importer (CDI). It allows for cloning PVCs and populating them with data from base images. The CDI is necessary if you want to automatically provision persistent volumes for your virtual machines, and it is also required for the KubeVirt CSI Driver, which is used to handle persistent volumes claims from tenant Kubernetes clusters. But at first, you have to decide where and how you will store these data. Storage for Kubernetes VMs With the introduction of the CSI (Container Storage Interface), a wide range of technologies that integrate with Kubernetes has become available. In fact, KubeVirt fully utilizes the CSI interface, aligning the choice of storage for virtualization closely with the choice of storage for Kubernetes itself. However, there are nuances, which you need to consider. Unlike containers, which typically use a standard filesystem, block devices are more efficient for virtual machine. Although the CSI interface in Kubernetes allows the request of both types of volumes: filesystems and block devices, it's important to verify that your storage backend supports this. Using block devices for virtual machines eliminates the need for an additional abstraction layer, such as a filesystem, that makes it more performant and in most cases enables the use of the ReadWriteMany mode. This mode allows concurrent access to the volume from multiple nodes, which is a critical feature for enabling the live migration of virtual machines in KubeVirt. The storage system can be external or internal (in the case of hyper-converged infrastructure). Using external storage in many cases makes the whole system more stable, as your data is stored separately from compute nodes. A diagram showing external data storage communication with the compute nodes External storage solutions are often popular in enterprise systems because such storage is frequently provided by an external vendor, that takes care of its operations. The integration with Kubernetes involves only a small component installed in the cluster - the CSI driver. This driver is responsible for provisioning volumes in this storage and attaching them to pods run by Kubernetes. However, such storage solutions can also be implemented using purely open-source technologies. One of the popular solutions is TrueNAS powered by democratic-csi driver. A diagram showing local data storage running on the compute nodes On the other hand, hyper-converged systems are often implemented using local storage (when you do not need replication) and with software-defined storages, often installed directly in Kubernetes, such as Rook/Ceph, OpenEBS, Longhorn, LINSTOR, and others. A diagram showing clustered data storage running on the compute nodes A hyper-converged system has its advantages. For example, data locality: when your data is stored locally, access to such data is faster. But there are disadvantages as such a system is usually more difficult to manage and maintain. At Ænix, we wanted to provide a ready-to-use solution that could be used without the need to purchase and setup an additional external storage, and that was optimal in terms of speed and resource utilization. LINSTOR became that solution. The time-tested and industry-popular technologies such as LVM and ZFS as backend gives confidence that data is securely stored. DRBD-based replication is incredible fast and consumes a small amount of computing resources. For installing LINSTOR in Kubernetes, there is the Piraeus project, which already provides a ready-made block storage to use with KubeVirt. Note: In case you are using Talos Linux, as we described in the previous article, you will need to enable the necessary kernel modules in advance, and configure piraeus as described in the instruction. Networking for Kubernetes VMs Despite having the similar interface - CNI, The network architecture in Kubernetes is actually more complex and typically consists of many independent components that are not directly connected to each other. In fact, you can split Kubernetes networking into four layers, which are described below. Node Network (Data Center Network) The network through which nodes are interconnected with each other. This network is usually not managed by Kubernetes, but it is an important one because, without it, nothing would work. In practice, the bare metal infrastructure usually has more than one of such networks e.g. one for node-to-node communication, second for storage replication, third for external access, etc. A diagram showing the role of the node network (data center network) on the Kubernetes networking scheme Configuring the physical network interaction between nodes goes beyond the scope of this article, as in most situations, Kubernetes utilizes already existing network infrastructure. Pod Network This is the network provided by your CNI plugin. The task of the CNI plugin is to ensure transparent connectivity between all containers and nodes in the cluster. Most CNI plugins implement a flat network from which separate blocks of IP addresses are allocated for use on each node. A diagram showing the role of the pod network (CNI-plugin) on the Kubernetes network scheme In practice, your cluster can have several CNI plugins managed by Multus. This approach is often used in virtualization solutions based on KubeVirt - Rancher and OpenShift. The primary CNI plugin is used for integration with Kubernetes services, while additional CNI plugins are used to implement private networks (VPC) and integration with the physical networks of your data center. The default CNI-plugins can be used to connect bridges or physical interfaces. Additionally, there are specialized plugins such as macvtap-cni which are designed to provide more performance. One additional aspect to keep in mind when running virtual machines in Kubernetes is the need for IPAM (IP Address Management), especially for secondary interfaces provided by Multus. This is commonly managed by a DHCP server operating within your infrastructure. Additionally, the allocation of MAC addresses for virtual machines can be managed by Kubemacpool. Although in our platform, we decided to go another way and fully rely on Kube-OVN. This CNI plugin is based on OVN (Open Virtual Network) which was originally developed for OpenStack and it provides a complete network solution for virtual machines in Kubernetes, features Custom Resources for managing IPs and MAC addresses, supports live migration with preserving IP addresses between the nodes, and enables the creation of VPCs for physical network separation between tenants. In Kube-OVN you can assign separate subnets to an entire namespace or connect them as additional network interfaces using Multus. Services Network In addition to the CNI plugin, Kubernetes also has a services network, which is primarily needed for service discovery. Contrary to traditional virtual machines, Kubernetes is originally designed to run pods with a random address. And the services network provides a convenient abstraction (stable IP addresses and DNS names) that will always direct traffic to the correct pod. The same approach is also commonly used with virtual machines in clouds despite the fact that their IPs are usually static. A diagram showing the role of the services network (services network plugin) on the Kubernetes network scheme The implementation of the services network in Kubernetes is handled by the services network plugin, The standard implementation is called kube-proxy and is used in most clusters. But nowadays, this functionality might be provided as part of the CNI plugin. The most advanced implementation is offered by the Cilium project, which can be run in kube-proxy replacement mode. Cilium is based on the eBPF technology, which allows for efficient offloading of the Linux networking stack, thereby improving performance and security compared to traditional methods based on iptables. In practice, Cilium and Kube-OVN can be easily integrated to provide a unified solution that offers seamless, multi-tenant networking for virtual machines, as well as advanced network policies and combined services network functionality. External Traffic Load Balancer At this stage, you already have everything needed to run virtual machines in Kubernetes. But there is actually one more thing. You still need to access your services from outside your cluster, and an external load balancer will help you with organizing this. For bare metal Kubernetes clusters, there are several load balancers available: MetalLB, kube-vip, LoxiLB, also Cilium and Kube-OVN provides built-in implementation. The role of a external load balancer is to provide a stable address available externally and direct external traffic to the services network. The services network plugin will direct it to your pods and virtual machines as usual. A diagram showing the role of the external load balancer on the Kubernetes network scheme In most cases, setting up a load balancer on bare metal is achieved by creating floating IP address on the nodes within the cluster, and announce it externally using ARP/NDP or BGP protocols. After exploring various options, we decided that MetalLB is the simplest and most reliable solution, although we do not strictly enforce the use of only it. Another benefit is that in L2 mode, MetalLB speakers continuously check their neighbour's state by sending preforming liveness checks using a memberlist protocol. This enables failover that works independently of Kubernetes control-plane. Conclusion This concludes our overview of virtualization, storage, and networking in Kubernetes. The technologies mentioned here are available and already pre-configured on the Cozystack platform, where you can try them with no limitations. In the next article, I'll detail how, on top of this, you can implement the provisioning of fully functional Kubernetes clusters with just the click of a button. View the full article
  • Forum Statistics

    43.9k
    Total Topics
    43.4k
    Total Posts
×
×
  • Create New...