Jump to content

Search the Community

Showing results for tags 'kubernetes'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • General
    • General Discussion
    • Artificial Intelligence
    • DevOpsForum News
  • DevOps & SRE
    • DevOps & SRE General Discussion
    • Databases, Data Engineering & Data Science
    • Development & Programming
    • CI/CD, GitOps, Orchestration & Scheduling
    • Docker, Containers, Microservices, Serverless & Virtualization
    • Infrastructure-as-Code
    • Kubernetes & Container Orchestration
    • Linux
    • Logging, Monitoring & Observability
    • Security, Governance, Risk & Compliance
  • Cloud Providers
    • Amazon Web Services
    • Google Cloud Platform
    • Microsoft Azure

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Cloud Platforms


Cloud Experience


Development Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

  1. With Kubernetes 1.30, we (SIG Auth) are moving Structured Authentication Configuration to beta. Today's article is about authentication: finding out who's performing a task, and checking that they are who they say they are. Check back in tomorrow to find about what's new in Kubernetes v1.30 around authorization (deciding what someone can and can't access). Motivation Kubernetes has had a long-standing need for a more flexible and extensible authentication system. The current system, while powerful, has some limitations that make it difficult to use in certain scenarios. For example, it is not possible to use multiple authenticators of the same type (e.g., multiple JWT authenticators) or to change the configuration without restarting the API server. The Structured Authentication Configuration feature is the first step towards addressing these limitations and providing a more flexible and extensible way to configure authentication in Kubernetes. What is structured authentication configuration? Kubernetes v1.30 builds on the experimental support for configurating authentication based on a file, that was added as alpha in Kubernetes v1.30. At this beta stage, Kubernetes only supports configuring JWT authenticators, which serve as the next iteration of the existing OIDC authenticator. JWT authenticator is an authenticator to authenticate Kubernetes users using JWT compliant tokens. The authenticator will attempt to parse a raw ID token, verify it's been signed by the configured issuer. The Kubernetes project added configuration from a file so that it can provide more flexibility than using command line options (which continue to work, and are still supported). Supporting a configuration file also makes it easy to deliver further improvements in upcoming releases. Benefits of structured authentication configuration Here's why using a configuration file to configure cluster authentication is a benefit: Multiple JWT authenticators: You can configure multiple JWT authenticators simultaneously. This allows you to use multiple identity providers (e.g., Okta, Keycloak, GitLab) without needing to use an intermediary like Dex that handles multiplexing between multiple identity providers. Dynamic configuration: You can change the configuration without restarting the API server. This allows you to add, remove, or modify authenticators without disrupting the API server. Any JWT-compliant token: You can use any JWT-compliant token for authentication. This allows you to use tokens from any identity provider that supports JWT. The minimum valid JWT payload must contain the claims documented in structured authentication configuration page in the Kubernetes documentation. CEL (Common Expression Language) support: You can use CEL to determine whether the token's claims match the user's attributes in Kubernetes (e.g., username, group). This allows you to use complex logic to determine whether a token is valid. Multiple audiences: You can configure multiple audiences for a single authenticator. This allows you to use the same authenticator for multiple audiences, such as using a different OAuth client for kubectl and dashboard. Using identity providers that don't support OpenID connect discovery: You can use identity providers that don't support OpenID Connect discovery. The only requirement is to host the discovery document at a different location than the issuer (such as locally in the cluster) and specify the issuer.discoveryURL in the configuration file. How to use Structured Authentication Configuration To use structured authentication configuration, you specify the path to the authentication configuration using the --authentication-config command line argument in the API server. The configuration file is a YAML file that specifies the authenticators and their configuration. Here is an example configuration file that configures two JWT authenticators: apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthenticationConfiguration # Someone with a valid token from either of these issuers could authenticate # against this cluster. jwt: - issuer: url: https://issuer1.example.com audiences: - audience1 - audience2 audienceMatchPolicy: MatchAny claimValidationRules: expression: 'claims.hd == "example.com"' message: "the hosted domain name must be example.com" claimMappings: username: expression: 'claims.username' groups: expression: 'claims.groups' uid: expression: 'claims.uid' extra: - key: 'example.com/tenant' expression: 'claims.tenant' userValidationRules: - expression: "!user.username.startsWith('system:')" message: "username cannot use reserved system: prefix" # second authenticator that exposes the discovery document at a different location # than the issuer - issuer: url: https://issuer2.example.com discoveryURL: https://discovery.example.com/.well-known/openid-configuration audiences: - audience3 - audience4 audienceMatchPolicy: MatchAny claimValidationRules: expression: 'claims.hd == "example.com"' message: "the hosted domain name must be example.com" claimMappings: username: expression: 'claims.username' groups: expression: 'claims.groups' uid: expression: 'claims.uid' extra: - key: 'example.com/tenant' expression: 'claims.tenant' userValidationRules: - expression: "!user.username.startsWith('system:')" message: "username cannot use reserved system: prefix" Migration from command line arguments to configuration file The Structured Authentication Configuration feature is designed to be backwards-compatible with the existing approach, based on command line options, for configuring the JWT authenticator. This means that you can continue to use the existing command-line options to configure the JWT authenticator. However, we (Kubernetes SIG Auth) recommend migrating to the new configuration file-based approach, as it provides more flexibility and extensibility. Note If you specify --authentication-config along with any of the --oidc-* command line arguments, this is a misconfiguration. In this situation, the API server reports an error and then immediately exits. If you want to switch to using structured authentication configuration, you have to remove the --oidc-* command line arguments, and use the configuration file instead. Here is an example of how to migrate from the command-line flags to the configuration file: Command-line arguments --oidc-issuer-url=https://issuer.example.com --oidc-client-id=example-client-id --oidc-username-claim=username --oidc-groups-claim=groups --oidc-username-prefix=oidc: --oidc-groups-prefix=oidc: --oidc-required-claim="hd=example.com" --oidc-required-claim="admin=true" --oidc-ca-file=/path/to/ca.pem There is no equivalent in the configuration file for the --oidc-signing-algs. For Kubernetes v1.30, the authenticator supports all the asymmetric algorithms listed in oidc.go. Configuration file apiVersion: apiserver.config.k8s.io/v1beta1 kind: AuthenticationConfiguration jwt: - issuer: url: https://issuer.example.com audiences: - example-client-id certificateAuthority: <value is the content of file /path/to/ca.pem> claimMappings: username: claim: username prefix: "oidc:" groups: claim: groups prefix: "oidc:" claimValidationRules: - claim: hd requiredValue: "example.com" - claim: admin requiredValue: "true" What's next? For Kubernetes v1.31, we expect the feature to stay in beta while we get more feedback. In the coming releases, we want to investigate: Making distributed claims work via CEL expressions. Egress selector configuration support for calls to issuer.url and issuer.discoveryURL. You can learn more about this feature on the structured authentication configuration page in the Kubernetes documentation. You can also follow along on the KEP-3331 to track progress across the coming Kubernetes releases. Try it out In this post, I have covered the benefits the Structured Authentication Configuration feature brings in Kubernetes v1.30. To use this feature, you must specify the path to the authentication configuration using the --authentication-config command line argument. From Kubernetes v1.30, the feature is in beta and enabled by default. If you want to keep using command line arguments instead of a configuration file, those will continue to work as-is. We would love to hear your feedback on this feature. Please reach out to us on the #sig-auth-authenticators-dev channel on Kubernetes Slack (for an invitation, visit https://slack.k8s.io/). How to get involved If you are interested in getting involved in the development of this feature, share feedback, or participate in any other ongoing SIG Auth projects, please reach out on the #sig-auth channel on Kubernetes Slack. You are also welcome to join the bi-weekly SIG Auth meetings held every-other Wednesday. View the full article
  2. On behalf of the Kubernetes project, I am excited to announce that ValidatingAdmissionPolicy has reached general availability as part of Kubernetes 1.30 release. If you have not yet read about this new declarative alternative to validating admission webhooks, it may be interesting to read our previous post about the new feature. If you have already heard about ValidatingAdmissionPolicies and you are eager to try them out, there is no better time to do it than now. Let's have a taste of a ValidatingAdmissionPolicy, by replacing a simple webhook. Example admission webhook First, let's take a look at an example of a simple webhook. Here is an excerpt from a webhook that enforces runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation, and privileged to be set to the least permissive values. func verifyDeployment(deploy *appsv1.Deployment) error { var errs []error for i, c := range deploy.Spec.Template.Spec.Containers { if c.Name == "" { return fmt.Errorf("container %d has no name", i) } if c.SecurityContext == nil { errs = append(errs, fmt.Errorf("container %q does not have SecurityContext", c.Name)) } if c.SecurityContext.RunAsNonRoot == nil || !*c.SecurityContext.RunAsNonRoot { errs = append(errs, fmt.Errorf("container %q must set RunAsNonRoot to true in its SecurityContext", c.Name)) } if c.SecurityContext.ReadOnlyRootFilesystem == nil || !*c.SecurityContext.ReadOnlyRootFilesystem { errs = append(errs, fmt.Errorf("container %q must set ReadOnlyRootFilesystem to true in its SecurityContext", c.Name)) } if c.SecurityContext.AllowPrivilegeEscalation != nil && *c.SecurityContext.AllowPrivilegeEscalation { errs = append(errs, fmt.Errorf("container %q must NOT set AllowPrivilegeEscalation to true in its SecurityContext", c.Name)) } if c.SecurityContext.Privileged != nil && *c.SecurityContext.Privileged { errs = append(errs, fmt.Errorf("container %q must NOT set Privileged to true in its SecurityContext", c.Name)) } } return errors.NewAggregate(errs) } Check out What are admission webhooks? Or, see the full code of this webhook to follow along with this walkthrough. The policy Now let's try to recreate the validation faithfully with a ValidatingAdmissionPolicy. apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "pod-security.policy.example.com" spec: failurePolicy: Fail matchConstraints: resourceRules: - apiGroups: ["apps"] apiVersions: ["v1"] operations: ["CREATE", "UPDATE"] resources: ["deployments"] validations: - expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.runAsNonRoot) && c.securityContext.runAsNonRoot) message: 'all containers must set runAsNonRoot to true' - expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.readOnlyRootFilesystem) && c.securityContext.readOnlyRootFilesystem) message: 'all containers must set readOnlyRootFilesystem to true' - expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.allowPrivilegeEscalation) || !c.securityContext.allowPrivilegeEscalation) message: 'all containers must NOT set allowPrivilegeEscalation to true' - expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.Privileged) || !c.securityContext.Privileged) message: 'all containers must NOT set privileged to true' Create the policy with kubectl. Great, no complain so far. But let's get the policy object back and take a look at its status. kubectl get -oyaml validatingadmissionpolicies/pod-security.policy.example.com status: typeChecking: expressionWarnings: - fieldRef: spec.validations[3].expression warning: | apps/v1, Kind=Deployment: ERROR: <input>:1:76: undefined field 'Privileged' | object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.Privileged) || !c.securityContext.Privileged) | ...........................................................................^ ERROR: <input>:1:128: undefined field 'Privileged' | object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.Privileged) || !c.securityContext.Privileged) | ...............................................................................................................................^ The policy was checked against its matched type, which is apps/v1.Deployment. Looking at the fieldRef, the problem was with the 3rd expression (index starts with 0) The expression in question accessed an undefined Privileged field. Ahh, looks like it was a copy-and-paste error. The field name should be in lowercase. apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "pod-security.policy.example.com" spec: failurePolicy: Fail matchConstraints: resourceRules: - apiGroups: ["apps"] apiVersions: ["v1"] operations: ["CREATE", "UPDATE"] resources: ["deployments"] validations: - expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.runAsNonRoot) && c.securityContext.runAsNonRoot) message: 'all containers must set runAsNonRoot to true' - expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.readOnlyRootFilesystem) && c.securityContext.readOnlyRootFilesystem) message: 'all containers must set readOnlyRootFilesystem to true' - expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.allowPrivilegeEscalation) || !c.securityContext.allowPrivilegeEscalation) message: 'all containers must NOT set allowPrivilegeEscalation to true' - expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.privileged) || !c.securityContext.privileged) message: 'all containers must NOT set privileged to true' Check its status again, and you should see all warnings cleared. Next, let's create a namespace for our tests. kubectl create namespace policy-test Then, I bind the policy to the namespace. But at this point, I set the action to Warn so that the policy prints out warnings instead of rejecting the requests. This is especially useful to collect results from all expressions during development and automated testing. apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicyBinding metadata: name: "pod-security.policy-binding.example.com" spec: policyName: "pod-security.policy.example.com" validationActions: ["Warn"] matchResources: namespaceSelector: matchLabels: "kubernetes.io/metadata.name": "policy-test" Tests out policy enforcement. kubectl create -n policy-test -f- <<EOF apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx name: nginx spec: selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: nginx name: nginx securityContext: privileged: true allowPrivilegeEscalation: true EOF Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must set runAsNonRoot to true Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must set readOnlyRootFilesystem to true Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must NOT set allowPrivilegeEscalation to true Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must NOT set privileged to true Error from server: error when creating "STDIN": admission webhook "webhook.example.com" denied the request: [container "nginx" must set RunAsNonRoot to true in its SecurityContext, container "nginx" must set ReadOnlyRootFilesystem to true in its SecurityContext, container "nginx" must NOT set AllowPrivilegeEscalation to true in its SecurityContext, container "nginx" must NOT set Privileged to true in its SecurityContext] Looks great! The policy and the webhook give equivalent results. After a few other cases, when we are confident with our policy, maybe it is time to do some cleanup. For every expression, we repeat access to object.spec.template.spec.containers and to each securityContext; There is a pattern of checking presence of a field and then accessing it, which looks a bit verbose. Fortunately, since Kubernetes 1.28, we have new solutions for both issues. Variable Composition allows us to extract repeated sub-expressions into their own variables. Kubernetes enables the optional library for CEL, which are excellent to work with fields that are, you guessed it, optional. With both features in mind, let's refactor the policy a bit. apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "pod-security.policy.example.com" spec: failurePolicy: Fail matchConstraints: resourceRules: - apiGroups: ["apps"] apiVersions: ["v1"] operations: ["CREATE", "UPDATE"] resources: ["deployments"] variables: - name: containers expression: object.spec.template.spec.containers - name: securityContexts expression: 'variables.containers.map(c, c.?securityContext)' validations: - expression: variables.securityContexts.all(c, c.?runAsNonRoot == optional.of(true)) message: 'all containers must set runAsNonRoot to true' - expression: variables.securityContexts.all(c, c.?readOnlyRootFilesystem == optional.of(true)) message: 'all containers must set readOnlyRootFilesystem to true' - expression: variables.securityContexts.all(c, c.?allowPrivilegeEscalation != optional.of(true)) message: 'all containers must NOT set allowPrivilegeEscalation to true' - expression: variables.securityContexts.all(c, c.?privileged != optional.of(true)) message: 'all containers must NOT set privileged to true' The policy is now much cleaner and more readable. Update the policy, and you should see it function the same as before. Now let's change the policy binding from warning to actually denying requests that fail validation. apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicyBinding metadata: name: "pod-security.policy-binding.example.com" spec: policyName: "pod-security.policy.example.com" validationActions: ["Deny"] matchResources: namespaceSelector: matchLabels: "kubernetes.io/metadata.name": "policy-test" And finally, remove the webhook. Now the result should include only messages from the policy. kubectl create -n policy-test -f- <<EOF apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx name: nginx spec: selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - image: nginx name: nginx securityContext: privileged: true allowPrivilegeEscalation: true EOF The deployments "nginx" is invalid: : ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com' denied request: all containers must set runAsNonRoot to true Please notice that, by design, the policy will stop evaluation after the first expression that causes the request to be denied. This is different from what happens when the expressions generate only warnings. Set up monitoring Unlike a webhook, a policy is not a dedicated process that can expose its own metrics. Instead, you can use metrics from the API server in their place. Here are some examples in Prometheus Query Language of common monitoring tasks. To find the 95th percentile execution duration of the policy shown above. histogram_quantile(0.95, sum(rate(apiserver_validating_admission_policy_check_duration_seconds_bucket{policy="pod-security.policy.example.com"}[5m])) by (le)) To find the rate of the policy evaluation. rate(apiserver_validating_admission_policy_check_total{policy="pod-security.policy.example.com"}[5m]) You can read the metrics reference to learn more about the metrics above. The metrics of ValidatingAdmissionPolicy are currently in alpha, and more and better metrics will come while the stability graduates in the future release. View the full article
  3. Amazon CloudWatch Container Insights with Enhanced Observability for EKS now auto-discovers critical health metrics from your AWS accelerators Trainium and Inferentia, and AWS high performance network adapters (Elastic Fabric Adapters) as well as NVIDIA GPUs. You can visualize these out-of-the-box metrics in curated Container Insights dashboards to help monitor your accelerated infrastructure and optimize your AI workloads for operational excellence. View the full article
  4. Author: Akihiro Suda (NTT) Read-only volume mounts have been a feature of Kubernetes since the beginning. Surprisingly, read-only mounts are not completely read-only under certain conditions on Linux. As of the v1.30 release, they can be made completely read-only, with alpha support for recursive read-only mounts. Read-only volume mounts are not really read-only by default Volume mounts can be deceptively complicated. You might expect that the following manifest makes everything under /mnt in the containers read-only: --- apiVersion: v1 kind: Pod spec: volumes: - name: mnt hostPath: path: /mnt containers: - volumeMounts: - name: mnt mountPath: /mnt readOnly: true However, any sub-mounts beneath /mnt may still be writable! For example, consider that /mnt/my-nfs-server is writeable on the host. Inside the container, writes to /mnt/* will be rejected but /mnt/my-nfs-server/* will still be writeable. New mount option: recursiveReadOnly Kubernetes 1.30 added a new mount option recursiveReadOnly so as to make submounts recursively read-only. The option can be enabled as follows: --- apiVersion: v1 kind: Pod spec: volumes: - name: mnt hostPath: path: /mnt containers: - volumeMounts: - name: mnt mountPath: /mnt readOnly: true # NEW # Possible values are `Enabled`, `IfPossible`, and `Disabled`. # Needs to be specified in conjunction with `readOnly: true`. recursiveReadOnly: Enabled This is implemented by applying the MOUNT_ATTR_RDONLY attribute with the AT_RECURSIVE flag using mount_setattr(2) added in Linux kernel v5.12. For backwards compatibility, the recursiveReadOnly field is not a replacement for readOnly, but is used in conjunction with it. To get a properly recursive read-only mount, you must set both fields. Feature availability To enable recursiveReadOnly mounts, the following components have to be used: Kubernetes: v1.30 or later, with the RecursiveReadOnlyMounts feature gate enabled. As of v1.30, the gate is marked as alpha. CRI runtime: containerd: v2.0 or later OCI runtime: runc: v1.1 or later crun: v1.8.6 or later Linux kernel: v5.12 or later What's next? Kubernetes SIG Node hope - and expect - that the feature will be promoted to beta and eventually general availability (GA) in future releases of Kubernetes, so that users no longer need to enable the feature gate manually. The default value of recursiveReadOnly will still remain Disabled, for backwards compatibility. How can I learn more? Please check out the documentation for the further details of recursiveReadOnly mounts. How to get involved? This feature is driven by the SIG Node community. Please join us to connect with the community and share your ideas and feedback around the above feature and beyond. We look forward to hearing from you! View the full article
  5. Authors: Rodrigo Campos Catelin (Microsoft), Giuseppe Scrivano (Red Hat), Sascha Grunert (Red Hat) Linux provides different namespaces to isolate processes from each other. For example, a typical Kubernetes pod runs within a network namespace to isolate the network identity and a PID namespace to isolate the processes. One Linux namespace that was left behind is the user namespace. This namespace allows us to isolate the user and group identifiers (UIDs and GIDs) we use inside the container from the ones on the host. This is a powerful abstraction that allows us to run containers as "root": we are root inside the container and can do everything root can inside the pod, but our interactions with the host are limited to what a non-privileged user can do. This is great for limiting the impact of a container breakout. A container breakout is when a process inside a container can break out onto the host using some unpatched vulnerability in the container runtime or the kernel and can access/modify files on the host or other containers. If we run our pods with user namespaces, the privileges the container has over the rest of the host are reduced, and the files outside the container it can access are limited too. In Kubernetes v1.25, we introduced support for user namespaces only for stateless pods. Kubernetes 1.28 lifted that restriction, and now, with Kubernetes 1.30, we are moving to beta! What is a user namespace? Note: Linux user namespaces are a different concept from Kubernetes namespaces. The former is a Linux kernel feature; the latter is a Kubernetes feature. User namespaces are a Linux feature that isolates the UIDs and GIDs of the containers from the ones on the host. The identifiers in the container can be mapped to identifiers on the host in a way where the host UID/GIDs used for different containers never overlap. Furthermore, the identifiers can be mapped to unprivileged, non-overlapping UIDs and GIDs on the host. This brings two key benefits: Prevention of lateral movement: As the UIDs and GIDs for different containers are mapped to different UIDs and GIDs on the host, containers have a harder time attacking each other, even if they escape the container boundaries. For example, suppose container A runs with different UIDs and GIDs on the host than container B. In that case, the operations it can do on container B's files and processes are limited: only read/write what a file allows to others, as it will never have permission owner or group permission (the UIDs/GIDs on the host are guaranteed to be different for different containers). Increased host isolation: As the UIDs and GIDs are mapped to unprivileged users on the host, if a container escapes the container boundaries, even if it runs as root inside the container, it has no privileges on the host. This greatly protects what host files it can read/write, which process it can send signals to, etc. Furthermore, capabilities granted are only valid inside the user namespace and not on the host, limiting the impact a container escape can have. User namespace IDs allocation Without using a user namespace, a container running as root in the case of a container breakout has root privileges on the node. If some capabilities were granted to the container, the capabilities are valid on the host too. None of this is true when using user namespaces (modulo bugs, of course ). Changes in 1.30 In Kubernetes 1.30, besides moving user namespaces to beta, the contributors working on this feature: Introduced a way for the kubelet to use custom ranges for the UIDs/GIDs mapping Have added a way for Kubernetes to enforce that the runtime supports all the features needed for user namespaces. If they are not supported, Kubernetes will show a clear error when trying to create a pod with user namespaces. Before 1.30, if the container runtime didn't support user namespaces, the pod could be created without a user namespace. Added more tests, including tests in the cri-tools repository. You can check the documentation on user namespaces for how to configure custom ranges for the mapping. Demo A few months ago, CVE-2024-21626 was disclosed. This vulnerability score is 8.6 (HIGH). It allows an attacker to escape a container and read/write to any path on the node and other pods hosted on the same node. Rodrigo created a demo that exploits CVE 2024-21626 and shows how the exploit, which works without user namespaces, is mitigated when user namespaces are in use. Please note that with user namespaces, an attacker can do on the host file system what the permission bits for "others" allow. Therefore, the CVE is not completely prevented, but the impact is greatly reduced. Node system requirements There are requirements on the Linux kernel version and the container runtime to use this feature. On Linux you need Linux 6.3 or greater. This is because the feature relies on a kernel feature named idmap mounts, and support for using idmap mounts with tmpfs was merged in Linux 6.3. Suppose you are using CRI-O with crun; as always, you can expect support for Kubernetes 1.30 with CRI-O 1.30. Please note you also need crun 1.9 or greater. If you are using CRI-O with runc, this is still not supported. Containerd support is currently targeted for containerd 2.0, and the same crun version requirements apply. If you are using containerd with runc, this is still not supported. Please note that containerd 1.7 added experimental support for user namespaces, as implemented in Kubernetes 1.25 and 1.26. We did a redesign in Kubernetes 1.27, which requires changes in the container runtime. Those changes are not present in containerd 1.7, so it only works with user namespaces support in Kubernetes 1.25 and 1.26. Another limitation of containerd 1.7 is that it needs to change the ownership of every file and directory inside the container image during Pod startup. This has a storage overhead and can significantly impact the container startup latency. Containerd 2.0 will probably include an implementation that will eliminate the added startup latency and storage overhead. Consider this if you plan to use containerd 1.7 with user namespaces in production. None of these containerd 1.7 limitations apply to CRI-O. How do I get involved? You can reach SIG Node by several means: Slack: #sig-node Mailing list Open Community Issues/PRs You can also contact us directly: GitHub: @rata @giuseppe @saschagrunert Slack: @rata @giuseppe @sascha View the full article
  6. Today, we are excited to announce that customers will now be able to use Apache Livy to submit their Apache Spark jobs to Amazon EMR on EKS, in addition to using StartJobRun API, Spark Operator, Spark Submit and Interactive Endpoints. With this launch, customers will be able to use a REST interface to easily submit Spark jobs or snippets of Spark code, retrieve results synchronously or asynchronously while continuing to get all of the Amazon EMR on EKS benefits such as EMR optimized Spark runtime, SSL secured Livy endpoint, programmatic set-up experience etc. View the full article
  7. This post was coauthored by Venkatesh Nannan, Sr. Engineering Manager at Rippling Introduction Rippling is a workforce management system that eliminates the friction of running a business, combining HR, IT, and Finance apps on a unified data platform. Rippling’s mission is to free up intelligent people to work on hard problems. Existing Stack Rippling uses a modular monolith architecture with different Docker entrypoints for multiple services and background jobs. These components are managed within a single, large, multi-tenant production cluster on Amazon Elastic Kubernetes Service (Amazon EKS per region), on a scale of over 1000 nodes. Rippling’s infra stack consists of: Karpenter for cluster autoscaling – a flexible, high-performance Kubernetes cluster autoscaler making sure of optimal compute capacity. Horizontal Pod Autoscaler for scaling Kubernetes pods based on demand. KEDA, an event-driven autoscaler for scaling background job processing containers based on event volume. IAM Roles for Service Accounts (IRSA) provide temporary AWS Identity and Access Management (IAM) credentials to the Kubernetes pod, enabling access to AWS resources such as Amazon Simple Storage Service (Amazon S3) buckets, etc. Argo CD, an open-source, GitOps continuous delivery tool, deploys applications and add-on software to the Kubernetes cluster. AWS Load Balancer Controller exposes Kubernetes services to end-users. TargetGroupBinding Custom Resource binds pods to Application Load Balancer (ALB) target groups. Amazon EKS managed node groups spanning across multiple Availability Zones (AZs). In addition to these technologies, we were using Cilium CNI for controlling network traffic between pods. However, we were running into challenges with this part of our stack, so we decided to look for the following alternatives. Figure 1: High level architecture of Rippling Challenges As Amazon EKS version 1.23 approached end-of-life, upgrading to v1.27 became imperative. However, during our initial attempts at upgrading to v1.24 in our non-production environment, we encountered a significant hurdle. New nodes running Cilium failed to join the cluster, increasing our downtime and requiring operational work on the CNI plugin. As a company, we prioritize using managed services to streamline operations and focus on adding value to our business. This Kubernetes upgrade task gave us an opportunity to look at alternatives that would be easier to maintain. We saw that AWS had just announced the VPC CNI support for k8s network policies using eBPF. We realized that migrating to this solution would enable us to replace our third-party networking add-on and solely rely on VPC CNI for both cluster networking and network policy implementation. This change would help reduce the overhead of managing operational software needed for cluster networking. Introduction of Amazon VPC CNI support for network policies When AWS announced VPC CNI support for k8s Network Policies using eBPF, we wanted to use the Amazon VPC CNI to secure the traffic in our Kubernetes clusters and simplify our EKS cluster management and operations. As network policy agents are bundled in existing VPC CNI pods, we would no longer need to run additional daemon pods and network policy controllers on the worker nodes. We followed the blue-green cluster upgrade strategy and were able to safely migrate the traffic from the old cluster to the new cluster with minimal risk of breaking existing workloads. Planning the migration We did an inventory of the applied network policies in our existing cluster and the various ingress/egress features used. This helped us identify deviations from upstream K8s Network Policies. This is necessary for migrating, as Amazon VPC CNI supports only the upstream k8s network policies as of this writing. Rippling was not using advanced features from our third-party network policy engines such as Global Network Policies, DNS based policy rules, or rule priority. Therefore, we did not need Custom Resource Definition (CRD) transformations going into the migration process. AWS recommends converting third-party NetworkPolicy CRDs to Kubernetes Network Policy resources and testing the converted policies in a separate test cluster before migrating from third-party to VPC CNI Network Policy engine in production. To assist in the migration process, AWS has developed a tool called K8s Network Policy Migrator that converts existing supported Calico/Cilium network policy CRDs to Kubernetes native network policies. After conversion you can directly test the converted network policies on your new clusters running VPC CNI network policy controller. The tool is designed to help streamline the migration process and make sure of a smooth transition. Picking migration strategy There are broadly two strategies to migrate the CNI plugin in the EKS cluster: (1) In-place and (2) Blue-Green. The in-Place strategy replaces an existing third-party CNI plugin with the VPC CNI plugin with network policy support in an existing EKS cluster. This would entail the following steps: Creating a new label “cni-plugin=3p” on the existing Amazon EKS managed node groups and Karpenter NodePool resources. Updating the existing third-party CNI DaemonSet to schedule CNI pods on those labeled nodes. Deploy the Amazon EKS Add-on version of Amazon VPC CNI and schedule them to nodes without the “cni-plugin” label. At this point the existing nodes have third-party CNI plugin pods and not the VPC CNI pods. Launch new Amazon EKS Managed node groups, Karpenter NodePool resources without the “cni-plugin=3p” label so that VPC CNI pods can be scheduled to those nodes. Drain and delete the existing Amazon EKS managed node groups and Karpenter NodePool resources to move the workloads to the new worker nodes with VPC CNI. Finally, delete the third-party CNI and associated network policy controllers from the cluster. As you can see, this process is involved, needs careful orchestration, and is more prone to errors that impact the application availability. The second approach is to use the Blue-Green strategy, in which a new EKS cluster is launched with the VPC CNI plugin and then the workloads are migrated to it. This approach is safer since it can be rolled back and provides the ability to test the setup in isolation before routing the live production traffic. Therefore, we chose the Blue-Green strategy for our migration. Migration As part of the blue-green strategy, we created a new EKS cluster with the Amazon VPC CNI and enabled Network Policy support by customizing the VPC CNI Amazon EKS add-on configuration. We also deployed the Argo CD agent on the cluster and bootstrapped it using Argo CD’s App of apps pattern to deploy the applications into the cluster. Network policies were also deployed to the cluster using the Argo CD. This was tested in a non-production environment to migrate from the third-party CNI to VPC CNI to make sure that applications and services passed functional tests. Then we could safely migrate the traffic from the old cluster to the new cluster without risks by leveraging the same strategy in the production environment. Lessons learned Amazon VPC CNI uses the VPC IP space to assign IP addresses to k8s pods. This led us to realize our existing VPCs were not properly sized to meet the growing number of k8s pods. We added a permitted secondary CIDR block 100.64.0.0/10 to the VPC and configured VPC CNI Custom Networking feature to assign those IP addresses to the k8s pods. This proactive measure makes sure of scalability as our infrastructure expands, mitigating concerns about IP address exhaustion. Leveraging automation and Infrastructure-as-Code (IaC) is recommended, especially as we are replicating existing clusters and migrating the workloads to them. Conclusion In this post, we discussed how Rippling migrated from third-party CNI to Amazon VPC CNI in their Amazon EKS clusters and enabled network policy support to secure pod-to-pod communications. Rippling used the blue-green strategy for the migration to minimize the application impact, and safely cut over the traffic to the new cluster. This migration helped Rippling to use the native features offered by AWS and reduced the burden of managing the operational software in our EKS clusters. Venkatesh Nannan, Sr. Engineering Manager – Infrastructure at Rippling Venkatesh Nannan is a seasoned Engineering leader with expertise in building scalable cloud-native applications, specializing in backend development and infrastructure architecture. View the full article
  8. Patent-search platform provider IPRally is growing quickly, servicing global enterprises, IP law firms, and multiple national patent and trademark offices. As the company grows, so do its technology needs. It continues to train its models for greater accuracy, adding 200,000 searchable records for customer access weekly, and mapping new patents. With millions of patent documents published annually – and the technical complexity of those documents increasing — it can take even the most seasoned patent professional several hours of research to resolve a case with traditional patent search tools. In 2018, Finnish firm IPRally set out to tackle this problem with a graph-based approach. “Search engines for patents were mostly complicated boolean ones, where you needed to spend hours building a complicated query,” says Juho Kallio, CTO and co-founder of the 50-person firm. “I wanted to build something important and challenging.” Using machine learning (ML) and natural language processing (NLP), the company has transformed the text from over 120 million global patent documents into document-level knowledge graphs embedded into a searchable vector space. Now, patent researchers can receive relevant results in seconds with AI-selected highlights of key information and explainable results. To meet those needs, IPRally built a customized ML platform using Google Kubernetes Engine (GKE) and Ray, an open-source ML framework, balancing efficiency, performance and streamlining machine learning operations (MLOps). The company uses open-source KubeRay to deploy and manage Ray on GKE, which enables them to leverage cost-efficient NVIDIA GPU Spot instances for exploratory ML research and development. It also uses Google Cloud data building blocks, including Cloud Storage and Compute Engine persistent disks. Next on the horizon is expanding to big data solutions with Ray Data and BigQuery. “Ray on GKE has the ability to support us in the future with any scale and any kind of distributed complex deep learning,” says Kallio. A custom ML platform built for performance and efficiency The IPRally engineering team’s primary focus is on R&D and how it can continue to improve its Graph AI to make technical knowledge more accessible. With just two DevOps engineers and one MLOps engineer, IPRally was able to build its own customized ML platform with GKE and Ray as key components. A big proponent of open source, IPRally transitioned everything to Kubernetes when their compute needs grew. However, they didn’t want to have to manage Kubernetes themselves. That led them to GKE, with its scalability, flexibility, open ecosystem, and its support for a diverse set of accelerators. All told, this provides IPRally the right balance of performance and cost, as well as easy management of compute resources and the ability to efficiently scale down capacity when they don’t need it. “GKE provides the scalability and performance we need for these complex training and serving needs and we get the right granularity of control over data and compute,” says Kallio. One particular GKE capability that Kallio highlights is container image streaming, which has significantly accelerated their start-up time. “We have seen that container image streaming in GKE has a significant impact on expediting our application startup time. Image streaming helps us accelerate our start-up time for a training job after submission by 20%,” he shares. “And, when we are able to reuse an existing pod, we can start up in a few seconds instead of minutes.” The next layer is Ray, which the company uses to scale the distributed, parallelized Python and Clojure applications it uses for machine learning. To more easily manage Ray, IPRally uses KubeRay, a specialized tool that simplifies Ray cluster management on Kubernetes. IPRally uses Ray for the most advanced tasks like massive preprocessing of data and exploratory deep learning in R&D. “Interoperability between Ray and GKE autoscaling is smooth and robust. We can combine computational resources without any constraints,” says Kallio. The heaviest ML loads are mainly deployed on G2 VMs featuring eight NVIDIA L4 GPUs featuring up to eight NVIDIA L4 Tensor Core GPUs, which deliver cutting-edge performance-per-dollar for AI inference workloads. And by leveraging them within GKE, IPRally facilitates the creation of nodes on-demand, scales GPU resources as needed, thus optimizing its operational costs. There is a single Terraform-provisioned Kubernetes cluster in each of the regions that IPRally searches for the inexpensive spot instances. GKE and Ray then step in for compute orchestration and automated scaling. To further ease MLOps, IPRally built its own thin orchestration layer, IPRay, atop KubeRay and Ray. This layer provides a command line tool for data scientists to easily provision a templated Ray cluster that scales efficiently up and down and that can run jobs in Ray without needing to know Terraform. This self-service layer reduces friction and allows both engineers and data scientists to focus on their higher-value work. Technology paves the way for strong growth Through this selection of Google Cloud and open-source frameworks, IPRally has shown that a startup can build an enterprise-grade ML platform without spending millions of dollars. Focusing on providing a powerful MLOps and automation foundation from its earliest days has paid dividends in efficiency and the team’s ability to focus on R&D. “Crafting a flexible ML infrastructure from the best parts has been more than worth it,” shares Jari Rosti, an ML engineer at IPRally. “Now, we’re seeing the benefits of that investment multiply as we adapt the infrastructure to the constantly evolving ideas of modern ML. That’s something other young companies can achieve as well by leveraging Google Cloud and Ray.” Further, the company has been saving 70% of ML R&D costs by using Spot instances. These affordable instances offer the same quality VMs as on-demand instances but are subject to interruption. But because IPRally’s R&D workloads are fault-tolerant, they are a good fit for Spot instances. IPRally closed a €10m A round investment last year, and it’s forging on with ingesting and processing IP documentation from around the globe, with a focus on improving its graph neural network models and building the best AI platform for patent searching. With 3.4 million patents filed in 2022, the third consecutive year of growth, data will keep flowing and IPRally can continue helping intellectual property professionals find every relevant bit of information. "With Ray on GKE, we've built an ML foundation that is a testament to how powerful Google Cloud is with AI," says Kallio. “And now, we’re prepared to explore far more advanced deep learning and to keep growing.” View the full article
  9. Azure Container Apps Azure container apps is a fully managed Kubernetes service that could be compared to ECS in AWS or Cloud Run in GCP. Compared to AKS, all integrations with Azure are already done for you. The best example is the use of managed identity where here you only need to enable a parameter whereas in AKS it’s complicated and changes every two years. View the full article
  10. GitOps represents a transformative approach to managing and deploying applications within Kubernetes environments, offering many benefits ranging from automation to enhanced collaboration. By centralizing operations around Git repositories, GitOps streamlines processes, fosters reliability, and nurtures teamwork. However, as teams embrace GitOps principles, the natural question arises: can these principles extend to managing databases? The answer is a resounding yes! Yet, while GitOps seamlessly aligns with stateless application management, applying it to stateful workloads, especially databases, presents distinct challenges. In this article, we’ll delve into the landscape of implementing GitOps for stateful applications and databases in Kubernetes, exploring five essential best practices to navigate this terrain effectively. Considerations for applying GitOps to stateful applications Versioning and managing stateful data When applying GitOps principles, it’s crucial to version control persistent data alongside application code. Tools like Git LFS (Large File Storage) can help you manage large datasets efficiently. Ensure that changes to stateful data are captured in Git commits and properly documented to maintain data integrity and facilitate reproducibility. Handling database schema changes and migrations Database schema changes and migrations require careful handling in GitOps workflows. Define database schema changes as code and store migration scripts in version-controlled repositories. Test and apply migrations consistently across environments with automated tools and continuous integration/continuous delivery processes. Backup and disaster recovery strategies Develop robust backup and disaster recovery strategies for stateful applications. Regularly back up data and configuration files to resilient storage solutions. Test data recovery and automate backup procedures to ensure preparedness for unforeseen events or data loss. You can also leverage GitOps practices to manage backup configurations and version-controlled recovery plans. Managing migrations like any other GitOps application Migration should follow suit as applications are deployed and managed using GitOps principles. This means defining migration tasks as declarative configurations stored in Git repositories alongside other application artifacts. These migration configurations should specify the desired state of the database schema or data transformation, including any dependencies or prerequisites. GitOps operators, such as the Atlas Operator for databases, can then pull these migration configurations from Git repositories and apply them to target databases. The operator ensures that the database’s actual state aligns with the desired state defined in the Git repository, automating the process of applying migrations and maintaining consistency across environments. Handling stateful application upgrades and rollbacks Planned and executed stateful application upgrades and rollbacks carefully. Define upgrade strategies that minimize downtime and data loss during the migration process to minimize downtime and data loss. Utilize GitOps principles to manage version-controlled manifests for application upgrades, ensuring consistency and reproducibility across environments. Implement automated rollback mechanisms to revert to previous application versions in case of failures or issues during upgrades. Best practices for implementing GitOps with stateful applications Infrastructure as Code (IaC) for provisioning storage resources Make your storage resources part of your Infrastructure as Code (IaC) practices. Define your storage configurations using tools like Terraform or Kubernetes manifests, and keep them version-controlled alongside your application code. This ensures consistency and reproducibility in your infrastructure deployments. Using Helm charts or operators Simplify the deployment and management of your stateful applications, including databases, by using Helm charts or Kubernetes operators. Helm helps package and template complex configurations while operators automate common operational tasks. Pick the best fit for your needs and keep your application management consistent. Implementing automated testing Automate your testing as much as possible for your stateful applications, including database changes. Develop thorough test suites to check everything from functionality to performance. Tools like Kubernetes Testing Framework (KTF) can help simulate production-like environments and catch issues early on. Continuous (CI/CD) pipelines Set up CI/CD pipelines tailored to your stateful applications, focusing on databases. Automate your build, test, and deployment processes to ensure smooth operation. Remember to trigger pipeline executions based on version-controlled changes so you have consistent deployments across different environments. Storing everything in Git By storing everything in Git repositories, teams benefit from version control, traceability, and collaboration. Every change made to configurations or migrations is tracked, providing a clear history of modifications and enabling easy rollback to previous states if necessary. Moreover, Git’s branching and merging capabilities facilitate collaborative development efforts, allowing multiple team members to work concurrently on different features or fixes without stepping on each other’s toes. Mastering GitOps for stateful workloads Applying GitOps principles to stateful applications, including databases, brings numerous benefits to development and operations teams. By storing everything in Git repositories, including data, migration scripts, and configurations, teams ensure version control, traceability, and collaboration. Handling database schema changes, migrations, and backups within GitOps workflows ensures consistency and reliability across environments. Moreover, managing migrations and upgrades as part of GitOps applications streamlines deployment processes and reduces the risk of errors. Implementing best practices such as Infrastructure as Code (IaC), leveraging Helm charts or operators, and implementing automated testing and CI/CD pipelines further enhances the efficiency and reliability of managing stateful applications. By adopting GitOps for stateful applications, organizations can achieve greater agility, scalability, and resilience in their software delivery processes. With a solid foundation of GitOps principles and best practices in place, teams can confidently navigate the complexities of managing stateful applications in Kubernetes environments, enabling them to focus on efficiently delivering value to their customers. The post 5 Best practices for implementing GitOps for stateful applications and databases in Kubernetes appeared first on Amazic. View the full article
  11. Starting today, customers can receive granular cost visibility for Amazon Elastic Kubernetes Service (Amazon EKS) in the AWS Cost and Usage Reports (CUR), enabling you to analyze, optimize, and chargeback cost and usage for your Kubernetes applications. With AWS Split Cost Allocation Data for Amazon EKS, customers can now allocate application costs to individual business units and teams based on how Kubernetes applications consume shared EC2 CPU and memory resources. View the full article
  12. Editors: Amit Dsouza, Frederick Kautz, Kristin Martin, Abigail McCarthy, Natali Vlatko Announcing the release of Kubernetes v1.30: Uwubernetes, the cutest release! Similar to previous releases, the release of Kubernetes v1.30 introduces new stable, beta, and alpha features. The consistent delivery of top-notch releases underscores the strength of our development cycle and the vibrant support from our community. This release consists of 45 enhancements. Of those enhancements, 17 have graduated to Stable, 18 are entering Beta, and 10 have graduated to Alpha. Release theme and logo Kubernetes v1.30: Uwubernetes Kubernetes v1.30 makes your clusters cuter! Kubernetes is built and released by thousands of people from all over the world and all walks of life. Most contributors are not being paid to do this; we build it for fun, to solve a problem, to learn something, or for the simple love of the community. Many of us found our homes, our friends, and our careers here. The Release Team is honored to be a part of the continued growth of Kubernetes. For the people who built it, for the people who release it, and for the furries who keep all of our clusters online, we present to you Kubernetes v1.30: Uwubernetes, the cutest release to date. The name is a portmanteau of “kubernetes” and “UwU,” an emoticon used to indicate happiness or cuteness. We’ve found joy here, but we’ve also brought joy from our outside lives that helps to make this community as weird and wonderful and welcoming as it is. We’re so happy to share our work with you. UwU Improvements that graduated to stable in Kubernetes v1.30 This is a selection of some of the improvements that are now stable following the v1.30 release. Robust VolumeManager reconstruction after kubelet restart (SIG Storage) This is a volume manager refactoring that allows the kubelet to populate additional information about how existing volumes are mounted during the kubelet startup. In general, this makes volume cleanup after kubelet restart or machine reboot more robust. This does not bring any changes for user or cluster administrators. We used the feature process and feature gate NewVolumeManagerReconstruction to be able to fall back to the previous behavior in case something goes wrong. Now that the feature is stable, the feature gate is locked and cannot be disabled. Prevent unauthorized volume mode conversion during volume restore (SIG Storage) For Kubernetes 1.30, the control plane always prevents unauthorized changes to volume modes when restoring a snapshot into a PersistentVolume. As a cluster administrator, you'll need to grant permissions to the appropriate identity principals (for example: ServiceAccounts representing a storage integration) if you need to allow that kind of change at restore time. Warning: Action required before upgrading. The prevent-volume-mode-conversion feature flag is enabled by default in the external-provisioner v4.0.0 and external-snapshotter v7.0.0. Volume mode change will be rejected when creating a PVC from a VolumeSnapshot unless you perform the steps described in the the "Urgent Upgrade Notes" sections for the external-provisioner 4.0.0 and the external-snapshotter v7.0.0. For more information on this feature also read converting the volume mode of a Snapshot. Pod Scheduling Readiness (SIG Scheduling) Pod scheduling readiness graduates to stable this release, after being promoted to beta in Kubernetes v1.27. This now-stable feature lets Kubernetes avoid trying to schedule a Pod that has been defined, when the cluster doesn't yet have the resources provisioned to allow actually binding that Pod to a node. That's not the only use case; the custom control on whether a Pod can be allowed to schedule also lets you implement quota mechanisms, security controls, and more. Crucially, marking these Pods as exempt from scheduling cuts the work that the scheduler would otherwise do, churning through Pods that can't or won't schedule onto the nodes your cluster currently has. If you have cluster autoscaling active, using scheduling gates doesn't just cut the load on the scheduler, it can also save money. Without scheduling gates, the autoscaler might otherwise launch a node that doesn't need to be started. In Kubernetes v1.30, by specifying (or removing) a Pod's .spec.schedulingGates, you can control when a Pod is ready to be considered for scheduling. This is a stable feature and is now formally part of the Kubernetes API definition for Pod. Min domains in PodTopologySpread (SIG Scheduling) The minDomains parameter for PodTopologySpread constraints graduates to stable this release, which allows you to define the minimum number of domains. This feature is designed to be used with Cluster Autoscaler. If you previously attempted use and there weren't enough domains already present, Pods would be marked as unschedulable. The Cluster Autoscaler would then provision node(s) in new domain(s), and you'd eventually get Pods spreading over enough domains. Go workspaces for k/k (SIG Architecture) The Kubernetes repo now uses Go workspaces. This should not impact end users at all, but does have a impact for developers of downstream projects. Switching to workspaces caused some breaking changes in the flags to the various k8s.io/code-generator tools. Downstream consumers should look at staging/src/k8s.io/code-generator/kube_codegen.sh to see the changes. For full details on the changes and reasons why Go workspaces was introduced, read Using Go workspaces in Kubernetes. Improvements that graduated to beta in Kubernetes v1.30 This is a selection of some of the improvements that are now beta following the v1.30 release. Node log query (SIG Windows) To help with debugging issues on nodes, Kubernetes v1.27 introduced a feature that allows fetching logs of services running on the node. To use the feature, ensure that the NodeLogQuery feature gate is enabled for that node, and that the kubelet configuration options enableSystemLogHandler and enableSystemLogQuery are both set to true. Following the v1.30 release, this is now beta (you still need to enable the feature to use it, though). On Linux the assumption is that service logs are available via journald. On Windows the assumption is that service logs are available in the application log provider. Logs are also available by reading files within /var/log/ (Linux) or C:\var\log\ (Windows). For more information, see the log query documentation. CRD validation ratcheting (SIG API Machinery) You need to enable the CRDValidationRatcheting feature gate to use this behavior, which then applies to all CustomResourceDefinitions in your cluster. Provided you enabled the feature gate, Kubernetes implements validation racheting for CustomResourceDefinitions. The API server is willing to accept updates to resources that are not valid after the update, provided that each part of the resource that failed to validate was not changed by the update operation. In other words, any invalid part of the resource that remains invalid must have already been wrong. You cannot use this mechanism to update a valid resource so that it becomes invalid. This feature allows authors of CRDs to confidently add new validations to the OpenAPIV3 schema under certain conditions. Users can update to the new schema safely without bumping the version of the object or breaking workflows. Contextual logging (SIG Instrumentation) Contextual Logging advances to beta in this release, empowering developers and operators to inject customizable, correlatable contextual details like service names and transaction IDs into logs through WithValues and WithName. This enhancement simplifies the correlation and analysis of log data across distributed systems, significantly improving the efficiency of troubleshooting efforts. By offering a clearer insight into the workings of your Kubernetes environments, Contextual Logging ensures that operational challenges are more manageable, marking a notable step forward in Kubernetes observability. Make Kubernetes aware of the LoadBalancer behaviour (SIG Network) The LoadBalancerIPMode feature gate is now beta and is now enabled by default. This feature allows you to set the .status.loadBalancer.ingress.ipMode for a Service with type set to LoadBalancer. The .status.loadBalancer.ingress.ipMode specifies how the load-balancer IP behaves. It may be specified only when the .status.loadBalancer.ingress.ip field is also specified. See more details about specifying IPMode of load balancer status. New alpha features Speed up recursive SELinux label change (SIG Storage) From the v1.27 release, Kubernetes already included an optimization that sets SELinux labels on the contents of volumes, using only constant time. Kubernetes achieves that speed up using a mount option. The slower legacy behavior requires the container runtime to recursively walk through the whole volumes and apply SELinux labelling individually to each file and directory; this is especially noticable for volumes with large amount of files and directories. Kubernetes 1.27 graduated this feature as beta, but limited it to ReadWriteOncePod volumes. The corresponding feature gate is SELinuxMountReadWriteOncePod. It's still enabled by default and remains beta in 1.30. Kubernetes 1.30 extends support for SELinux mount option to all volumes as alpha, with a separate feature gate: SELinuxMount. This feature gate introduces a behavioral change when multiple Pods with different SELinux labels share the same volume. See KEP for details. We strongly encourage users that run Kubernetes with SELinux enabled to test this feature and provide any feedback on the KEP issue. Feature gate Stage in v1.30 Behavior change SELinuxMountReadWriteOncePod Beta No SELinuxMount Alpha Yes Both feature gates SELinuxMountReadWriteOncePod and SELinuxMount must be enabled to test this feature on all volumes. This feature has no effect on Windows nodes or on Linux nodes without SELinux support. Recursive Read-only (RRO) mounts (SIG Node) Introducing Recursive Read-Only (RRO) Mounts in alpha this release, you'll find a new layer of security for your data. This feature lets you set volumes and their submounts as read-only, preventing accidental modifications. Imagine deploying a critical application where data integrity is key—RRO Mounts ensure that your data stays untouched, reinforcing your cluster's security with an extra safeguard. This is especially crucial in tightly controlled environments, where even the slightest change can have significant implications. Job success/completion policy (SIG Apps) From Kubernetes v1.30, indexed Jobs support .spec.successPolicy to define when a Job can be declared succeeded based on succeeded Pods. This allows you to define two types of criteria: succeededIndexes indicates that the Job can be declared succeeded when these indexes succeeded, even if other indexes failed. succeededCount indicates that the Job can be declared succeeded when the number of succeeded Indexes reaches this criterion. After the Job meets the success policy, the Job controller terminates the lingering Pods. Traffic distribution for services (SIG Network) Kubernetes v1.30 introduces the spec.trafficDistribution field within a Kubernetes Service as alpha. This allows you to express preferences for how traffic should be routed to Service endpoints. While traffic policies focus on strict semantic guarantees, traffic distribution allows you to express preferences (such as routing to topologically closer endpoints). This can help optimize for performance, cost, or reliability. You can use this field by enabling the ServiceTrafficDistribution feature gate for your cluster and all of its nodes. In Kubernetes v1.30, the following field value is supported: PreferClose: Indicates a preference for routing traffic to endpoints that are topologically proximate to the client. The interpretation of "topologically proximate" may vary across implementations and could encompass endpoints within the same node, rack, zone, or even region. Setting this value gives implementations permission to make different tradeoffs, for example optimizing for proximity rather than equal distribution of load. You should not set this value if such tradeoffs are not acceptable. If the field is not set, the implementation (like kube-proxy) will apply its default routing strategy. See Traffic Distribution for more details. Graduations, deprecations and removals for Kubernetes v1.30 Graduated to stable This lists all the features that graduated to stable (also known as general availability). For a full list of updates including new features and graduations from alpha to beta, see the release notes. This release includes a total of 17 enhancements promoted to Stable: Container Resource based Pod Autoscaling Remove transient node predicates from KCCM's service controller Go workspaces for k/k Reduction of Secret-based Service Account Tokens CEL for Admission Control CEL-based admission webhook match conditions Pod Scheduling Readiness Min domains in PodTopologySpread Prevent unauthorised volume mode conversion during volume restore API Server Tracing Cloud Dual-Stack --node-ip Handling AppArmor support Robust VolumeManager reconstruction after kubelet restart kubectl delete: Add interactive(-i) flag Metric cardinality enforcement Field status.hostIPs added for Pod Aggregated Discovery Deprecations and removals Removed the SecurityContextDeny admission plugin, deprecated since v1.27 (SIG Auth, SIG Security, and SIG Testing) With the removal of the SecurityContextDeny admission plugin, the Pod Security Admission plugin, available since v1.25, is recommended instead. Release notes Check out the full details of the Kubernetes 1.30 release in our release notes. Availability Kubernetes 1.30 is available for download on GitHub. To get started with Kubernetes, check out these interactive tutorials or run local Kubernetes clusters using minikube. You can also easily install 1.30 using kubeadm. Release team Kubernetes is only possible with the support, commitment, and hard work of its community. Each release team is made up of dedicated community volunteers who work together to build the many pieces that make up the Kubernetes releases you rely on. This requires the specialized skills of people from all corners of our community, from the code itself to its documentation and project management. We would like to thank the entire release team for the hours spent hard at work to deliver the Kubernetes v1.30 release to our community. The Release Team's membership ranges from first-time shadows to returning team leads with experience forged over several release cycles. A very special thanks goes out our release lead, Kat Cosgrove, for supporting us through a successful release cycle, advocating for us, making sure that we could all contribute in the best way possible, and challenging us to improve the release process. Project velocity The CNCF K8s DevStats project aggregates a number of interesting data points related to the velocity of Kubernetes and various sub-projects. This includes everything from individual contributions to the number of companies that are contributing and is an illustration of the depth and breadth of effort that goes into evolving this ecosystem. In the v1.30 release cycle, which ran for 14 weeks (January 8 to April 17), we saw contributions from 863 companies and 1391 individuals. Event update KubeCon + CloudNativeCon China 2024 will take place in Hong Kong, from 21 – 23 August 2024! You can find more information about the conference and registration on the event site. KubeCon + CloudNativeCon North America 2024 will take place in Salt Lake City, Utah, The United States of America, from 12 – 15 November 2024! You can find more information about the conference and registration on the eventsite. Upcoming release webinar Join members of the Kubernetes v1.30 release team on Thursday, May 23rd, 2024, at 9 A.M. PT to learn about the major features of this release, as well as deprecations and removals to help plan for upgrades. For more information and registration, visit the event page on the CNCF Online Programs site. Join members of the Kubernetes v1.30 release team on DATE AND TIME TBA to learn about the major features of this release, as well as deprecations and removals to help plan for upgrades. For more information and registration, visit the event page on the CNCF Online Programs site. Get involved The simplest way to get involved with Kubernetes is by joining one of the many Special Interest Groups (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and through the channels below. Thank you for your continued feedback and support. Follow us on 𝕏 @Kubernetesio for latest updates Join the community discussion on Discuss Join the community on Slack Post questions (or answer questions) on Stack Overflow Share your Kubernetes story Read more about what’s happening with Kubernetes on the blog Learn more about the Kubernetes Release Team View the full article
  13. Google Kubernetes Engine (GKE) has emerged as a leading container orchestration platform for deploying and managing containerized applications. But GKE is not just limited to running stateless microservices — its flexible design supports workloads that need to persist data as well, via tight integrations with persistent disk or blob storage products including Persistent Disk, Cloud Storage, and FIlestore. And for organizations that need even stronger throughput and performance, there’s Hyperdisk, Google Cloud's next-generation block storage service that allows you to modify your capacity, throughput, and IOPS-related performance and tailor it to your workloads, without having to re-deploy your entire stack. Today, we’re excited to introduce support for Hyperdisk Balanced storage volumes on GKE, which joins the Hyperdisk Extreme and Hyperdisk Throughput options, and is a good fit for workloads that typically rely on persistent SSDs — for example, line-of-business applications, web applications, and databases. Hyperdisk Balanced is supported on 3rd+ generation instance types. For instance type compatibility please reference this page. Understanding Hyperdisk First, let’s start with what it means to fine-tune your throughput with Hyperdisk? What about tuning IOPS and capacity? Tuning IOPS means refining the input/output operations per second (IOPS) performance of a storage device. Hyperdisk allows you to provision only the IOPS you need, and does not share it with other volumes on the same node. Tuning throughput means enhancing the amount of data or information that can be transferred or processed in a specified amount of time. Hyperdisk allows you to specify exactly how much throughput a given storage volume should have without limitations imposed by other volumes on the same node. Expanding capacity means you can increase the size of your storage volume. Hyperdisk can be provisioned for the exact capacity you need and extended as your storage needs grow. These Hyperdisk capabilities translate in to the following benefits: First, you can transform your environment's stateful environment through flexibility, ease of use, scalability and management — with a potential cost savings to boost. Imagine the benefit of a storage area network environment without the management overhead. Second, you can build lower-cost infrastructure by rightsizing the machine types that back your GKE nodes, optimizing your GKE stack while integrating with GKE CI/CD, security and networking capabilities. Finally, you get predictability — the consistency that comes with fine-grained tuning for each individual node and its required IOPS. You can also use this to fine-tune for ML model building/training/deploying, as Hyperdisk removes the element of throughput and IOPS from all PDs sharing the same node bandwidth, placing it on the provisioned Hyperdisk beneath it. Compared with traditional persistent disk, Hyperdisk’s storage performance is decoupled from the node your application is running on. This gives you more flexibility with your IOPs and throughput, while reducing the possibility that overall storage performance would be impacted by a noisy neighbor. On GKE, the following types of Hyperdisk volumes are available: Hyperdisk Balanced - General-purpose volume type that is the best fit for most workloads, with up to 2.4GBps of throughput and 160k IOPS. Ideal for line-of-business applications, web applications, databases, or boot disks. Hyperdisk Throughput - Optimized for cost-efficient high-throughput workloads, with up to 3 GBps throughput (>=128 KB IO size). Hyperdisk Throughput is targeted at scale-out analytics (e.g., Kafka, Hadoop, Redpanda) and other throughput-oriented cost-sensitive workloads, and is supported on GKE Autopilot and GKE Standard clusters. Hyperdisk Extreme - Specifically optimized for IOPS performance, such as large-scale databases that require high IOPS performance. Supported on Standard clusters only. Getting started with Hyperdisk on GKE To provision Hyperdisk you first need to make sure your cluster has the necessary StorageClass loaded that references the disk. You can add the necessary IOPS/Throughput to the storage class or go with the defaults, which are 3,600 IOPs and 140MiBps (Docs). YAML to Apply to GKE cluster code_block <ListValue: [StructValue([('code', 'apiVersion: storage.k8s.io/v1\r\nkind: StorageClass\r\nmetadata:\r\nname: balanced-storage\r\nprovisioner: pd.csi.storage.gke.io\r\nvolumeBindingMode: WaitForFirstConsumer\r\nallowVolumeExpansion: true\r\nparameters:\r\ntype: hyperdisk-balanced\r\nprovisioned-throughput-on-create: "200Mi" #optional\r\nprovisioned-iops-on-create: "5000" #optional'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3dfdda67a5b0>)])]> After you’ve configured the StorageClass, you can create a persistent volume claim that references it. code_block <ListValue: [StructValue([('code', 'kind: PersistentVolumeClaim\r\napiVersion: v1\r\nmetadata:\r\nname: postgres\r\nspec:\r\naccessModes:\r\n- ReadWriteOnce\r\nstorageClassName: balanced-storage\r\nresources:\r\nrequests:\r\nstorage: 1000Gi'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3dfdda67ad60>)])]> Take control of your GKE storage with Hyperdisk To recap, Hyperdisk lets you take control of your cloud infrastructure storage, throughput, IOPS, and capacity, combining the speed of existing PD SSD volumes with the flexibility to fine-tune to your specific needs, by decoupling disks from the node that your workload is running on. For more, check out the following sessions from Next ‘24: Next generation storage: Designing storage for the future A primer on data on Kubernetes And be sure to check out these resources: How to create a Hyperdisk on GKE Data management solutions on GKE Contact your Google Cloud representative View the full article
  14. Artificial Intelligence (AI) and Kubernetes are pillars of modern technology, each contributing significantly to innovation and efficiency. With AI adoption skyrocketing across industries, the demand for robust infrastructure to support AI workloads has surged. According to a recent report by Gartner, global spending on AI is projected to reach $297 billion by 2027 from $124 billion in 2022, with businesses increasingly investing in AI-driven solutions to gain a competitive edge. Concurrently, Kubernetes has emerged as the de facto standard for container orchestration, witnessing a remarkable growth trajectory. CNCF published the results of its latest microsurvey report on cloud-native FinOps and cloud financial management (CFM). Kubernetes has driven cloud spending up for 49% of respondents, while 28% stated their costs remain unchanged and 24% saved after migrating to Kubernetes. This intersection of AI and Kubernetes signifies a paradigm shift in technology, empowering organizations to harness the power of AI at scale while leveraging Kubernetes’ agility and scalability for seamless deployment and management. This post will examine Kubernetes’s role in managing end-to-end AI pipelines, including developing, training, and deploying AI models at each procedure phase. We’ll discuss how Kubernetes facilitates the creation of efficient, repeatable workflows by data scientists and machine learning engineers, increasing output and accelerating innovation in the AI space. Understanding Kubernetes Before discussing Kubernetes’s importance in AI, let’s look at its definition and how it works. Kubernetes, often known as K8s, is an open-source container orchestration tool first developed by Google. It simplifies and automates containerized applications’ scaling, deployment, and management. Containers allow for the lightweight, portable packaging and deployment of applications along with their dependencies and customizations. Kubernetes nullifies the infrastructure concerns with a platform for delivering and managing containerized workloads. Empowering Innovation: The Synergy Between AI and Kubernetes Artificial intelligence (AI) and Kubernetes work hand in hand at the forefront of modern technology. Artificial intelligence (AI) is transforming industries through intelligent automation and decision-making. Meanwhile, Kubernetes provides the dependable infrastructure for deploying, scaling, and managing AI applications. Kubernetes facilitates the seamless coordination of AI workloads across several environments, optimizing resource usage and guaranteeing dependability. In return, AI leverages the scalability and agility of Kubernetes to provide innovative solutions that increase productivity and encourage business growth. Kubernetes and AI work together to produce a dynamic synergy that allows businesses to take full advantage of AI technologies in the rapidly evolving digital ecosystem. End-to-End AI Pipelines AI pipelines are the complex and interconnected procedures utilized in creating, training, and applying AI models. These pipelines typically include data preprocessing, model training, assessment, tuning, and deployment. Effective management of these pipelines at all levels requires automation and coordination. Kubernetes provides the infrastructure needed to orchestrate end-to-end AI pipelines with ease. Let us discuss how Kubernetes facilitates AI model development, training, and deployment. Development Phase During the development stage of an AI project, data scientists and machine learning engineers experiment with various algorithms, datasets, and model architectures to construct and enhance AI models. Setting up development environments is made easier by Kubernetes, which isolates every stage of the AI pipeline behind containers. Developers can define Kubernetes manifests, representing application components’ desired state, such as networking configurations, volumes, and containers. Subsequently, Kubernetes automatically schedules and starts these containers across the cluster, ensuring consistent and repeatable development environments. Training Phase After the model design is finished, the model is trained on large datasets. Training deep learning models sometimes requires a lot of processing power, such as GPUs or TPUs, for speedier processing. Two of Kubernetes’ advantages are its ability to independently scale resources in response to demand and distribute computational jobs throughout the cluster. Data scientists can use Kubernetes’ horizontal scaling characteristics to train many models in parallel and save significant training time. Furthermore, Kubernetes makes resource limits and quotas easier to implement, ensuring fair resource allocation and preventing conflicts between multiple teams or projects. Evaluation and Tuning After training, AI models must be evaluated using validation datasets to determine their performance. By integrating with tools like Kubeflow and TensorFlow Extended (TFX), Kubernetes enables hyperparameter tweaking and automatic model evaluation. These frameworks provide prebuilt components for creating and managing AI pipelines on Kubernetes clusters. Data scientists may speed up the iterative process of improving model performance by developing workflows that automate model review, model selection, and hyperparameter tuning. Deployment Phase Once a model is sufficiently accurate, it must be applied to predict new data in real-world scenarios. The Kubernetes platform facilitates the deployment of AI models by eliminating infrastructure-related issues and providing tools for container orchestration and service discovery. Data scientists can bundle learned models into container images using platforms like Docker or Kubernetes’ built-in support for custom resources like Custom Resource Definitions (CRDs). When these containerized models are deployed as microservices, they may be accessed via RESTful APIs or gRPC endpoints. Scaling and Monitoring In industrial applications, AI models could face varying workloads and demand levels. Thanks to Kubernetes ‘ auto-scaling functionality, resources can be dynamically changed based on real-time data, such as CPU usage, RAM consumption, or other application-specific indicators. This ensures optimal effectiveness and efficient use of resources, particularly during periods of high demand. Furthermore, Kubernetes provides information about the health and functionality of AI applications with easy integration with logging and monitoring tools such as Prometheus and Grafana. Data scientists can set up alerts and dashboards to monitor key indicators and respond quickly to any anomalies or issues. Reproducibility and Portability One of Kubernetes’ key advantages for AI pipelines is its reproducibility and portability. Kubernetes manifests are used to declaratively specify the desired state of the application, including dependencies, configurations, and environment variables. These manifests can be version-managed using Git or other version control systems, which promotes collaboration and repeatability in various settings. Furthermore, Kubernetes abstracts away the underlying infrastructure, simplifying the installation of AI pipelines on any cloud provider or internal data center with minimal to no adjustments. In conclusion, Kubernetes is essential to coordinating end-to-end AI pipelines, which include creating, training, and implementing AI models at every stage of the process. By automating container orchestration and abstracting away infrastructure complexities, Kubernetes frees data scientists and machine learning engineers to concentrate on innovation rather than infrastructure maintenance. Organizations may use Kubernetes to provide AI-powered apps that match the demands of today’s changing business landscape, increase productivity, and accelerate the speed of AI innovation. The post Kubernetes & its Role in AI: Orchestrating End-to-End AI Pipelines appeared first on Amazic. View the full article
  15. Kubernetes has transformed container Orchestration, providing an effective framework for delivering and managing applications at scale. However, efficient storage management is essential to guarantee the dependability, security, and efficiency of your Kubernetes clusters. Benefits like data loss prevention, regulations compliance, and maintaining operational continuity mitigating threats underscore the importance of security and dependability. This post will examine the best practices for the top 10 Kubernetes storage, emphasizing encryption, access control, and safeguarding storage components. Kubernetes Storage Kubernetes storage is essential to contemporary cloud-native setups because it makes data persistence in containerized apps more effective. It provides a dependable and scalable storage resource management system that guarantees data permanence through migrations and restarts of containers. Among other capabilities, persistent Volumes (PVs) and Persistent Volume Claims (PVCs) give Kubernetes a versatile abstraction layer for managing storage. By providing dynamic provisioning of storage volumes catered to particular workload requirements, storage classes further improve flexibility. Organizations can build and manage stateful applications with agility, scalability, and resilience in various computing settings by utilizing Kubernetes storage capabilities. 1. Data Encryption Sensitive information kept in Kubernetes clusters must be protected with data encryption. Use encryption tools like Kubernetes Secrets to safely store sensitive information like SSH keys, API tokens, and passwords. Encryption both in transit and at rest is also used to further protect data while it is being stored and transmitted between nodes. 2. Use Secrets Management Tools Steer clear of hardcoding private information straight into Kubernetes manifests. Instead, use powerful secrets management solutions like Vault or Kubernetes Secrets to securely maintain and distribute secrets throughout your cluster. This guarantees that private information is encrypted and available only to approved users and applications. 3. Implement Role-Based Access Control (RBAC) RBAC allows you to enforce fine-grained access controls on your Kubernetes clusters. Define roles and permissions to limit access to storage resources using the least privilege concept. This lowers the possibility of data breaches and unauthorized access by preventing unauthorized users or apps from accessing or changing crucial storage components. 4. Secure Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) Ensure that claims and persistent volumes are adequately secured to avoid tampering or unwanted access. Put security rules in place to limit access to particular namespaces or users and turn on encryption for information on persistent volumes. PVs and PVCs should have regular audits and monitoring performed to identify and address any security flaws or unwanted entry attempts. 5. Enable Network Policies To manage network traffic between pods and storage resources, use Kubernetes network policies. To guarantee that only authorized pods and services may access storage volumes and endpoints, define firewall rules restricting communication to and from storage components. This reduces the possibility of data exfiltration and network-based assaults and prevents unauthorized network access. 6. Enable Role-Based Volume Provisioning Utilize Kubernetes’ dynamic volume provisioning features to automate storage volume creation and management. To limit users’ ability to build or delete volumes based on their assigned roles and permissions, utilize role-based volume provisioning. This guarantees the effective and safe allocation of storage resources and helps prevent resource abuse. 7. Utilize Pod Security Policies To specify and implement security restrictions on pods’ access to storage resources, implement pod security policies. To manage pod rights, host resource access, and storage volume interactions, specify security policies. By implementing stringent security measures, you can reduce the possibility of privilege escalation, container escapes, and illegal access to storage components. 8. Regularly Update and Patch Kubernetes Components Monitor security flaws by regularly patching and updating Kubernetes components, including storage drivers and plugins. Keep your storage infrastructure safe from new attacks and vulnerabilities by subscribing to security advisories and adhering to best practices for Kubernetes cluster management. 9. Monitor and Audit Storage Activity To keep tabs on storage activity in your Kubernetes clusters, put extensive logging, monitoring, and auditing procedures in place. To proactively identify security incidents or anomalies, monitor access logs, events, and metrics on storage components. Utilize centralized logging and monitoring systems to see what’s happening with storage in your cluster. 10. Conduct Regular Security Audits and Penetration Testing Conduct comprehensive security audits and penetration tests regularly to evaluate the security posture of your Kubernetes storage system. Find and fix any security holes, incorrect setups, and deployment flaws in your storage system before hackers can exploit them. Work with security professionals and use automated security technologies to thoroughly audit your Kubernetes clusters. Considerations Before putting suggestions for Kubernetes storage into practice, take into account the following: Evaluate Security Requirements: Match storage options with compliance and corporate security requirements. Assess Performance Impact: Recognize the potential effects that resource usage and application performance may have from access controls, encryption, and security rules. Identify Roles and Responsibilities: Clearly define who is responsible for what when it comes to managing storage components in Kubernetes clusters. Plan for Scalability: Recognize the need for scalability and the possible maintenance costs related to implementing security measures. Make Monitoring and upgrades a Priority: To ensure that security measures continue to be effective over time, place a strong emphasis on continual monitoring, audits, and upgrades. Effective storage management is critical for ensuring the security, reliability, and performance of Kubernetes clusters. By following these ten best practices for Kubernetes storage, including encryption, access control, and securing storage components, you can strengthen the security posture of your Kubernetes environment and mitigate the risk of data breaches, unauthorized access, and other security threats. Stay proactive in implementing security measures and remain vigilant against emerging threats to safeguard your Kubernetes storage infrastructure effectively. The post Mastering Kubernetes Storage: 10 Best Practices for Security and Efficiency appeared first on Amazic. View the full article
  16. We are excited to announce that Amazon EMR on EKS simplified the authentication and authorization user experience by integrating with Amazon EKS's improved cluster access management controls. With this launch, Amazon EMR on EKS will use EKS access management controls to automatically obtain the necessary permissions to run Amazon EMR applications on the EKS cluster. View the full article
  17. Introduction Snapchat is an app that hundreds of millions of people around the world use to communicate with their close friends. The app is powered by microservice architectures deployed in Amazon Elastic Kubernetes Service (Amazon EKS) and datastores such as Amazon CloudFront, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and Amazon ElastiCache. This post explains how Snap builds its microservices, leveraging Amazon EKS with AWS Identity and Access Management. It also discusses how Snap protects its K8s resources with intelligent threat detection offered by GuardDuty that is augmented with Falco and in-house tooling to secure Snap’s cloud-native service mesh platform. The following figure (Figure 1) shows the main data flow when users send and receive snaps. To send a snap, the mobile app calls Snap API Gateway that routes the call to a Media service that persists the sender message media in S3 (Steps 1-3). Next, the Friend microservice validates the sender’s permission to Snap the recipient user by querying the messaging core service (MCS) that checks whether the recipient is a friend (Steps 4-6), and the conversation is stored in Snap DB powered by DynamoDB (Steps 7-8). To receive a snap, the mobile app calls MCS to get the message metadata such as the pointer to the media file. It calls the Media microservice to load the media file from the content system that persists user data, powered by CloudFront and Amazon S3 (Steps 9-11) Figure 1: Snaps end-to-end user data flow main data flow when users send and receive Snaps The original Snap service mesh design included single tenant microservice per EKS cluster. Snap discovered, however, that managing thousands of clusters added an operational burden as microservices grew. Additionally, they discovered that many environments were underused and unnecessarily consuming AWS account resources such as IAM roles and policies. This required enabling microservices to share clusters and redefining tenant isolation to meet security requirements. Finally, Snap wanted to limit access to microservice data and storage while keeping them centralized in a network account meshed with Google’s cloud. The following figure illustrates a Kubernetes-based microservice, Friends and Users, deployed in Amazon EKS or Google Kubernetes Engine (GKE). Snap users reach Snap’s API Gateway through Envoy. Switchboard, Snap’s mesh service configuration panel, updates Edge Envoy endpoints with available micro-services resources after deploying them. Figure 2 – Snap’s high mesh design Bootstrap The purpose of this stage is the preparation and implementation of a secure multi-cloud compute provisioning system. Snap uses Kubernetes clusters as a construct that defines an environment that hosts one or more microservices, such as Friends, and Users in the first figure. Snap’s security bootstrap includes three layers that include authentication, authorization, and admission control when designing a Kubernetes-based multi-cloud. Snap uses IAM roles for Kubernetes service accounts (IRSA) to provision fine-grained service identities for microservices running in shared EKS clusters that allow access to AWS services such as Amazon S3, DynamoDB, etc. For operator access scoped to the K8s namespace, Snap built a tool to manage K8s RBAC that maps K8s roles to IAM, allowing developers to perform service operations following the principle of least privileges. Beyond RBAC and IRSA, Snap wanted to impose deployment validations such as making sure containers are instantiated by approved image registries (such as Amazon Elastic Container Registry (Amazon ECR) or images built and signed by approved CI systems) as well as preventing containers from running with elevated permissions. To accomplish this, Snap built admission controller webhooks. Build-time Snap believes in empowering its engineers to architect microservices autonomously within K8s constructs. Snap’s goal was to maximize Amazon EKS security benefits while abstracting K8s semantics. Specifically, 1/ The security of the Cloud safeguarding the infrastructure that runs Amazon EKS, object-stores (Amazon S3), data-stores (KeyDB, ElastiCache, and DynamoDB), and the network that interconnects them. The security in the Cloud includes 2/ protecting the K8s cluster API server and etcd from malicious access, and finally, 3/ protecting Snap’s applications’ RBAC, network policies, data encryption, and containers. Switchboard – Snap’s mesh service configuration panel Snap built a configuration hub called Switchboard to provide a single control panel across AWS and GCP to create K8s clusters. Microservices owners can define environments in regions with specific compute types offered by cloud providers. Switchboard also enables service owners to follow approval workflows to establish trust for service-to-service authentication and specify public routes and other service configurations. It allows service owners to manage service dependencies, traffic routes between K8s clusters. Switchboard presents a simplified configuration model based on the environments. It manages metadata, such as the service owner, team email, and on-call paging information. Snap wanted to empower tenants to control access to microservice data objects (images, audio, and video) and metadata stores such as databases and cache stores so they deployed the data stores in separate data accounts controlled by IAM roles and policies in those accounts. Snap needed to centralize microservices network paths to mesh with GCP resources. Therefore, Snap deployed the microservices in a centralized account using IAM roles for service accounts that assume roles the tenants’ data AWS accounts. The following figure shows how multiple environments (Kubernetes cluster) host three tenants using two different IRSA. Friends’ Service A can read and write to a dedicated DynamoDB table deployed in a separate AWS account. Similarly, MCS’ Service B can get and cache sessions or friends in ElastiCache. Figure 2 – Snap’s high mesh design One of Snap’s design principles was to maximize autonomy while maintaining their desired level of isolation between environments, all while minimizing operational overhead. Snap chose Kubernetes service accounts as the minimal isolation level. Amazon EKS support for IRSA allowed Snap to leverage OIDC to simplify the process of granting IAM permissions to application pods. Snap also uses RBAC to limit access to K8s cluster resources and secure cluster users’ authentication. Snap considers adopting Amazon EKS Pod Identities to reuse associations when running the same application in multiple clusters. This is done by applying identical associations to each cluster without modifying the role trust policy. Deployment-time Cluster access by human operators AWS IAM users and roles are currently managed by Snap which generates policies based on business requirements. Operators use Switchboard to request access to their microservice. Switchboard map an IAM user to a cluster RBAC policy that grants access to Kuberentes objects. Snap is evaluating AWS Identity Center to allow Switchboard federate AWS Single Sign-On (SSO) with a central identity provider (IdP) for enabling cluster operators to have least-privilege access using cluster RBAC policies, enforced through AWS IAM. Isolation strategy Snap chose to isolate K8s cluster resources by namespaces to achieve isolation by limiting container permissions with IAM roles for Service Accounts, and CNI network policies. In addition, Snap provision separate pod identity for add-ons such as CNI, Cluster-AutoScaler, and FluentD. Add-ons uses separate IAM policies using IRSA and not overly permissive EC2 instance IAM roles. Network partitioning Snap’s mesh defines rules that restrict or permit network traffic between microservices pods with Amazon VPC CNI network policies. Snap wanted to minimize IP exhaustion caused by IPv4 address space limitations due to its massive scale. Instead of working around IPv4 limitations using Kubernetes IPv4/IPv6 dual-stack, Snap wanted to migrate gracefully to IPv6. Snap can connect IPv4-based Amazon EKS clusters to IPv6 clusters using Amazon EKS IPv6 support and Amazon VPC CNI. Container hardening Snap built admission controller webhook to audit and enforce pod security context to prevent containers from running with elevated permissions (RunAs) or accessing volumes at the cluster or namespace level. Snap validate that workloads don’t use configurations that break container isolation such as hostIPC, HostNetwork, and HostPort. Figure 4 – Snap’s admission controller service Network policies Kubernetes Network Policies enable you to define and enforce rules for traffic flow between pods. Policies act as a virtual firewall, which allows you to segment and secure your cluster by specifying network traffic rules for pods, namespaces, IP addresses, and ports. Amazon EKS extends and simplifies native support for network policies in Amazon VPC CNI and Amazon EC2, security groups, and network access control lists (NACLs) through the upstream Kubernetes Network Policy API. Run-time Audit logs Snap needs auditing system activities to enhance compliance, intrusion detection, and policy validation. This is to track unauthorized access, policy violations, suspicious activities, and incident responses. Snap uses Amazon EKS control plane logging that ingests API server, audit, authenticator, controller manager, and scheduler logs into CloudWatch. It also uses Amazon CloudTrail for cross-AWS services access and fluentd to ingest application logging to CloudWatch and Google’s operations suite. Runtime security monitoring Snap has begun using GuardDuty EKS Protection. This helps Snap monitor EKS cluster control plane activity by analyzing Amazon EKS audit logs to identify unauthorized and malicious access patterns. This functionality, combined with their admission controller events provides coverage of cluster changes. For runtime monitoring, Snap uses the open source Falco agent to monitor EKS workloads in the Snap service mesh. GuardDuty findings are contextualized by Falco rules based on container running processes. This context helps to identify cluster tenants with whom to triage the findings. Falco agents support Snap’s runtime monitoring goals and deliver consistent reporting. Snap compliments GuardDuty with Falco to ensure changes are not made to a running container by monitoring and analyzing container syscalls (container drift detection rule). Conclusion Snap’s cloud infrastructure has evolved from running a monolith inside Google App Engine to microservices deployed in Kubernetes across AWS and GCP. This streamlined architecture helped improve Snapchat’s reliability. Snap’s Kuberentes multi-tenant vision needed abstraction of cloud provider security semantics such as AWS security features to comply with strict security and privacy standards. This blog reviewed the methods and systems used to implement a secure compute and data platform on Amazon EKS and Amazon data-stores. This included bootstrapping, building, deploying, and running Snap’s workloads. Snap is not stopping here. Learn more about Snap and our collaboration with Snap. View the full article
  18. Today, we are announcing the general availability of provider-defined functions in the AWS, Google Cloud, and Kubernetes providers in conjunction with the HashiCorp Terraform 1.8 launch. This release represents yet another step forward in our unique approach to ecosystem extensibility. Provider-defined functions will allow anyone in the Terraform community to build custom functions within providers and extend the capabilities of Terraform. Introducing provider-defined functions Previously, users relied on a handful of built-in functions in the Terraform configuration language to perform a variety of tasks, including numeric calculations, string manipulations, collection transformations, validations, and other operations. However, the Terraform community needed more capabilities than the built-in functions could offer. With the release of Terraform 1.8, providers can implement custom functions that you can call from the Terraform configuration. The schema for a function is defined within the provider's schema using the Terraform provider plugin framework. To use a function, declare the provider as a required_provider in the terraform{} block: terraform { required_version = ">= 1.8.0" required_providers { time = { source = "hashicorp/local" version = "2.5.1" } } }Provider-defined functions can perform multiple tasks, including: Transforming existing data Parsing combined data into individual, referenceable components Building combined data from individual components Simplifying validations and assertions To access a provider-defined function, reference the provider:: namespace with the local name of the Terraform Provider. For example, you can use the direxists function by including provider::local::direxists() in your Terraform configuration. Below you’ll find several examples of new provider-defined functions in the officially supported AWS, Google Cloud, and Kubernetes providers. Terraform AWS provider The 5.40 release of the Terraform AWS provider includes its first provider-defined functions to parse and build Amazon Resource Names (ARNs), simplifying Terraform configurations where ARN manipulation is required. The arn_parse provider-defined function is used to parse an ARN and return an object of individual referenceable components, such as a region or account identifier. For example, to get the AWS account ID from an Amazon Elastic Container Registry (ECR) repository, use the arn_parse function to retrieve the account ID and set it as an output: # create an ECR repository resource "aws_ecr_repository" "hashicups" { name = "hashicups" image_scanning_configuration { scan_on_push = true } } # output the account ID of the ECR repository output "hashicups_ecr_repository_account_id" { value = provider::aws::arn_parse(aws_ecr_repository.hashicups.arn).account_id } Running terraform apply against the above configuration outputs the AWS Account ID: Apply complete! Resources: 2 added, 0 changed, 0 destroyed. Outputs: hashicups_ecr_repository_account_id = "751192555662" Without the arn_parse function, you would need to define and test a combination of built-in Terraform functions to split the ARN and reference the proper index or define a regular expression to match on a substring. The function handles the parsing for you in a concise manner so that you do not have to worry about doing this yourself. The AWS provider also includes a new arn_build function that builds an ARN from individual attributes and returns it as a string. This provider-defined function can create an ARN that you cannot reference from another resource. For example, you may want to allow another account to pull images from your ECR repository. The arn_build function below constructs an ARN for an IAM policy using an account ID: # allow another account to pull from the ECR repository data "aws_iam_policy_document" "cross_account_pull_ecr" { statement { sid = "AllowCrossAccountPull" effect = "Allow" principals { type = "AWS" identifiers = [ provider::aws::arn_build("aws", "iam", "", var.cross_account_id, "root"), ] } actions = [ "ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer", ] } }The arn_build function helps to guide and simplify the process of combining substrings to form an ARN, and it improves readability compared to using string interpolation. Without it, you'd have to look up the exact ARN structure in the AWS documentation and manually test it. Terraform Google Cloud provider The 5.23 release of the Terraform Google Cloud provider adds a simplified way to get regions, zones, names, and projects from the IDs of resources that aren’t managed by your Terraform configuration. Provider-defined functions can now help parse Google IDs when adding an IAM binding to a resource that’s managed outside of Terraform: resource "google_cloud_run_service_iam_member" "example_run_invoker_jane" { member = "user:jane@example.com" role = "run.invoker" service = provider::google::name_from_id(var.example_cloud_run_service_id) location = provider::google::location_from_id(var.example_cloud_run_service_id) project = provider::google::project_from_id(var.example_cloud_run_service_id) }The Google Cloud provider also includes a new region_from_zone provider-defined function that helps obtain region names from a given zone (e.g. “us-west1” from “us-west1-a”). This simple string processing could be achieved in multiple ways using Terraform’s built-in functions previously, but the new function simplifies the process: locals { zone = “us-central1-a” # ways to derive the region “us-central1” using built-in functions region_1 = join("-", slice(split("-", local.zone), 0, 2)) region_2 = substr(local.zone, 0, length(local.zone)-2) # our new region_from_zone function makes this easier! region_3 = provider::google::region_from_zone(local.zone) }Terraform Kubernetes provider The 2.28 release of the Terraform Kubernetes provider includes provider-defined functions for encoding and decoding Kubernetes manifests into Terraform, making it easier for practitioners to work with the kubernetes_manifest resource. Users that have a Kubernetes manifest in YAML format can use the manifest_decode function to convert it into a Terraform object. The example below shows how to use the manifest_decode function by referring to a Kubernetes manifest in YAML format embedded in the Terraform configuration: locals { manifest = <If you prefer to decode a YAML file instead of using an embedded YAML format, you can do so by combining the built-in file function with the manifest_decode function. $ cat manifest.yaml --- kind: Namespace apiVersion: v1 metadata: name: test labels: name: testresource "kubernetes_manifest" "example" { manifest = provider::kubernetes::manifest_decode(file("${path.module}/manifest.yaml")) }If your manifest YAML contains multiple Kubernetes resources, you may use the manifestdecodemulti function to decode them into a list which can then be used with the for_each attribute on the kubernetes_manifest resource: $ cat manifest.yaml --- kind: Namespace apiVersion: v1 metadata: name: test-1 labels: name: test-1 --- kind: Namespace apiVersion: v1 metadata: name: test-2 labels: name: test-2 resource "kubernetes_manifest" "example" { for_each = { for m in provider::kubernetes::manifest_decode_multi(file("${path.module}/manifest.yaml"))): m.metadata.name => m } manifest = each.value }Getting started with provider-defined functions Provider-defined functions allow Terraform configurations to become more expressive and readable by declaring practitioner intent and reducing complex, repetitive expressions. To learn about all of the new launch-day provider-defined functions, please review the documentation and changelogs of the aforementioned providers: Terraform AWS provider Terraform Google provider Terraform Kubernetes provider Review our Terraform Plugin Framework documentation to learn more about how provider-defined functions work and how you can make your own. We are thankful to our partners and community members for their valuable contributions to the HashiCorp Terraform ecosystem. View the full article
  19. Multi-cluster Ingress (MCI) is an advanced feature typically used in cloud computing environments that enables the management of ingress (the entry point for external traffic into a network) across multiple Kubernetes clusters. This functionality is especially useful for applications that are deployed globally across several regions or clusters, offering a unified method to manage access to these applications. MCI simplifies the process of routing external traffic to the appropriate cluster, enhancing both the reliability and scalability of applications. Here are key features and benefits of Multi-cluster Ingress: Global Load Balancing: MCI can intelligently route traffic to different clusters based on factors like region, latency, and health of the service. This ensures users are directed to the nearest or best-performing cluster, improving the overall user experience. Centralized Management: It allows for the configuration of ingress rules from a single point, even though these rules are applied across multiple clusters. This simplification reduces the complexity of managing global applications. High Availability and Redundancy: By spreading resources across multiple clusters, MCI enhances the availability and fault tolerance of applications. If one cluster goes down, traffic can be automatically rerouted to another healthy cluster. Cross-Region Failover: In the event of a regional outage or a significant drop in performance within one cluster, MCI can perform automatic failover to another cluster in a different region, ensuring continuous availability of services. Cost Efficiency: MCI helps optimize resource utilization across clusters, potentially leading to cost savings. Traffic can be routed to clusters where resources are less expensive or more abundant. Simplified DNS Management: Typically, MCI solutions offer integrated DNS management, automatically updating DNS records based on the health and location of clusters. This removes the need for manual DNS management in a multi-cluster setup. How What is Multi-cluster Ingress (MCI) works? Multi-cluster Ingress (MCI) works by managing and routing external traffic into applications running across multiple Kubernetes clusters. This process involves several components and steps to ensure that traffic is efficiently and securely routed to the appropriate destination based on predefined rules and policies. Here’s a high-level overview of how MCI operates: 1. Deployment Across Multiple Clusters Clusters Preparation: You deploy your application across multiple Kubernetes clusters, often spread across different geographical locations or cloud regions, to ensure high availability and resilience. Ingress Configuration: Each cluster has its own set of resources and services that need to be exposed externally. With MCI, you configure ingress resources that are aware of the multi-cluster environment. 2. Central Management and Configuration Unified Ingress Control: A central control plane is used to manage the ingress resources across all participating clusters. This is where you define the rules for how external traffic should be routed to your services. DNS and Global Load Balancer Setup: MCI integrates with global load balancers and DNS systems to direct users to the closest or most appropriate cluster based on various criteria like location, latency, and the health of the clusters. 3. Traffic Routing Initial Request: When a user makes a request to access the application, the DNS resolution directs the request to the global load balancer. Global Load Balancing: The global load balancer evaluates the request against the configured routing rules and the current state of the clusters (e.g., load, health). It then selects the optimal cluster to handle the request. Cluster Selection: The criteria for cluster selection can include geographic proximity to the user, the health and capacity of the clusters, and other custom rules defined in the MCI configuration. Request Forwarding: Once the optimal cluster is selected, the global load balancer forwards the request to an ingress controller in that cluster. Service Routing: The ingress controller within the chosen cluster then routes the request to the appropriate service based on the path, host, or other headers in the request. 4. Health Checks and Failover Continuous Monitoring: MCI continuously monitors the health and performance of all clusters and their services. This includes performing health checks and monitoring metrics to ensure each service is functioning correctly. Failover and Redundancy: In case a cluster becomes unhealthy or is unable to handle additional traffic, MCI automatically reroutes traffic to another healthy cluster, ensuring uninterrupted access to the application. 5. Scalability and Maintenance Dynamic Scaling: As traffic patterns change or as clusters are added or removed, MCI dynamically adjusts routing rules and load balancing to optimize performance and resource utilization. Configuration Updates: Changes to the application or its deployment across clusters can be managed centrally through the MCI configuration, simplifying updates and maintenance. Example Deployment YAML for Multi-cluster Ingress with FrontendConfig and BackendConfig This example includes: A simple web application deployment. A service to expose the application within the cluster. A MultiClusterService to expose the service across clusters. A MultiClusterIngress to expose the service externally with FrontendConfig and BackendConfig. The post What is Multi-cluster Ingress (MCI) appeared first on DevOpsSchool.com. View the full article
  20. Author: Andrei Kvapil (Ænix) Approaching the most interesting phase, this article delves into running Kubernetes within Kubernetes. Technologies such as Kamaji and Cluster API are highlighted, along with their integration with KubeVirt. Previous discussions have covered preparing Kubernetes on bare metal and how to turn Kubernetes into virtual machines management system. This article concludes the series by explaining how, using all of the above, you can build a full-fledged managed Kubernetes and run virtual Kubernetes clusters with just a click. First up, let's dive into the Cluster API. Cluster API Cluster API is an extension for Kubernetes that allows the management of Kubernetes clusters as custom resources within another Kubernetes cluster. The main goal of the Cluster API is to provide a unified interface for describing the basic entities of a Kubernetes cluster and managing their lifecycle. This enables the automation of processes for creating, updating, and deleting clusters, simplifying scaling, and infrastructure management. Within the context of Cluster API, there are two terms: management cluster and tenant clusters. Management cluster is a Kubernetes cluster used to deploy and manage other clusters. This cluster contains all the necessary Cluster API components and is responsible for describing, creating, and updating tenant clusters. It is often used just for this purpose. Tenant clusters are the user clusters or clusters deployed using the Cluster API. They are created by describing the relevant resources in the management cluster. They are then used for deploying applications and services by end-users. It's important to understand that physically, tenant clusters do not necessarily have to run on the same infrastructure with the management cluster; more often, they are running elsewhere. A diagram showing interaction of management Kubernetes cluster and tenant Kubernetes clusters using Cluster API For its operation, Cluster API utilizes the concept of providers which are separate controllers responsible for specific components of the cluster being created. Within Cluster API, there are several types of providers. The major ones are: Infrastructure Provider, which is responsible for providing the computing infrastructure, such as virtual machines or physical servers. Control Plane Provider, which provides the Kubernetes control plane, namely the components kube-apiserver, kube-scheduler, and kube-controller-manager. Bootstrap Provider, which is used for generating cloud-init configuration for the virtual machines and servers being created. To get started, you will need to install the Cluster API itself and one provider of each type. You can find a complete list of supported providers in the project's documentation. For installation, you can use the clusterctl utility, or Cluster API Operator as the more declarative method. Choosing providers Infrastructure provider To run Kubernetes clusters using KubeVirt, the KubeVirt Infrastructure Provider must be installed. It enables the deployment of virtual machines for worker nodes in the same management cluster, where the Cluster API operates. Control plane provider The Kamaji project offers a ready solution for running the Kubernetes control plane for tenant clusters as containers within the management cluster. This approach has several significant advantages: Cost-effectiveness: Running the control plane in containers avoids the use of separate control plane nodes for each cluster, thereby significantly reducing infrastructure costs. Stability: Simplifying architecture by eliminating complex multi-layered deployment schemes. Instead of sequentially launching a virtual machine and then installing etcd and Kubernetes components inside it, there's a simple control plane that is deployed and run as a regular application inside Kubernetes and managed by an operator. Security: The cluster's control plane is hidden from the end user, reducing the possibility of its components being compromised, and also eliminates user access to the cluster's certificate store. This approach to organizing a control plane invisible to the user is often used by cloud providers. Bootstrap provider Kubeadm as the Bootstrap Provider - as the standard method for preparing clusters in Cluster API. This provider is developed as part of the Cluster API itself. It requires only a prepared system image with kubelet and kubeadm installed and allows generating configs in the cloud-init and ignition formats. It's worth noting that Talos Linux also supports provisioning via the Cluster API and has providers for this. Although previous articles discussed using Talos Linux to set up a management cluster on bare-metal nodes, to provision tenant clusters the Kamaji+Kubeadm approach has more advantages. It facilitates the deployment of Kubernetes control planes in containers, thus removing the need for separate virtual machines for control plane instances. This simplifies the management and reduces costs. How it works The primary object in Cluster API is the Cluster resource, which acts as the parent for all the others. Typically, this resource references two others: a resource describing the control plane and a resource describing the infrastructure, each managed by a separate provider. Unlike the Cluster, these two resources are not standardized, and their kind depends on the specific provider you are using: A diagram showing the relationship of a Cluster resource and the resources it links to in Cluster API Within Cluster API, there is also a resource named MachineDeployment, which describes a group of nodes, whether they are physical servers or virtual machines. This resource functions similarly to standard Kubernetes resources such as Deployment, ReplicaSet, and Pod, providing a mechanism for the declarative description of a group of nodes and automatic scaling. In other words, the MachineDeployment resource allows you to declaratively describe nodes for your cluster, automating their creation, deletion, and updating according to specified parameters and the requested number of replicas. A diagram showing the relationship of a MachineDeployment resource and its children in Cluster API To create machines, MachineDeployment refers to a template for generating the machine itself and a template for generating its cloud-init config: A diagram showing the relationship of a MachineDeployment resource and the resources it links to in Cluster API To deploy a new Kubernetes cluster using Cluster API, you will need to prepare the following set of resources: A general Cluster resource A KamajiControlPlane resource, responsible for the control plane operated by Kamaji A KubevirtCluster resource, describing the cluster configuration in KubeVirt A KubevirtMachineTemplate resource, responsible for the virtual machine template A KubeadmConfigTemplate resource, responsible for generating tokens and cloud-init At least one MachineDeployment to create some workers Polishing the cluster In most cases, this is sufficient, but depending on the providers used, you may need other resources as well. You can find examples of the resources created for each type of provider in the Kamaji project documentation. At this stage, you already have a ready tenant Kubernetes cluster, but so far, it contains nothing but API workers and a few core plugins that are standardly included in the installation of any Kubernetes cluster: kube-proxy and CoreDNS. For full integration, you will need to install several more components: To install additional components, you can use a separate Cluster API Add-on Provider for Helm, or the same FluxCD discussed in previous articles. When creating resources in FluxCD, it's possible to specify the target cluster by referring to the kubeconfig generated by Cluster API. Then, the installation will be performed directly into it. Thus, FluxCD becomes a universal tool for managing resources both in the management cluster and in the user tenant clusters. A diagram showing the interaction scheme of fluxcd, which can install components in both management and tenant Kubernetes clusters What components are being discussed here? Generally, the set includes the following: CNI Plugin To ensure communication between pods in a tenant Kubernetes cluster, it's necessary to deploy a CNI plugin. This plugin creates a virtual network that allows pods to interact with each other and is traditionally deployed as a Daemonset on the cluster's worker nodes. You can choose and install any CNI plugin that you find suitable. A diagram showing a CNI plugin installed inside the tenant Kubernetes cluster on a scheme of nested Kubernetes clusters Cloud Controller Manager The main task of the Cloud Controller Manager (CCM) is to integrate Kubernetes with the cloud infrastructure provider's environment (in your case, it is the management Kubernetes cluster in which all worksers of tenant Kubernetes are provisioned). Here are some tasks it performs: When a service of type LoadBalancer is created, the CCM initiates the process of creating a cloud load balancer, which directs traffic to your Kubernetes cluster. If a node is removed from the cloud infrastructure, the CCM ensures its removal from your cluster as well, maintaining the cluster's current state. When using the CCM, nodes are added to the cluster with a special taint, node.cloudprovider.kubernetes.io/uninitialized, which allows for the processing of additional business logic if necessary. After successful initialization, this taint is removed from the node. Depending on the cloud provider, the CCM can operate both inside and outside the tenant cluster. The KubeVirt Cloud Provider is designed to be installed in the external parent management cluster. Thus, creating services of type LoadBalancer in the tenant cluster initiates the creation of LoadBalancer services in the parent cluster, which direct traffic into the tenant cluster. A diagram showing a Cloud Controller Manager installed outside of a tenant Kubernetes cluster on a scheme of nested Kubernetes clusters and the mapping of services it manages from the parent to the child Kubernetes cluster CSI Driver The Container Storage Interface (CSI) is divided into two main parts for interacting with storage in Kubernetes: csi-controller: This component is responsible for interacting with the cloud provider's API to create, delete, attach, detach, and resize volumes. csi-node: This component runs on each node and facilitates the mounting of volumes to pods as requested by kubelet. In the context of using the KubeVirt CSI Driver, a unique opportunity arises. Since virtual machines in KubeVirt runs within the management Kubernetes cluster, where a full-fledged Kubernetes API is available, this opens the path for running the csi-controller outside of the user's tenant cluster. This approach is popular in the KubeVirt community and offers several key advantages: Security: This method hides the internal cloud API from the end-user, providing access to resources exclusively through the Kubernetes interface. Thus, it reduces the risk of direct access to the management cluster from user clusters. Simplicity and Convenience: Users don't need to manage additional controllers in their clusters, simplifying the architecture and reducing the management burden. However, the CSI-node must necessarily run inside the tenant cluster, as it directly interacts with kubelet on each node. This component is responsible for the mounting and unmounting of volumes into pods, requiring close integration with processes occurring directly on the cluster nodes. The KubeVirt CSI Driver acts as a proxy for ordering volumes. When a PVC is created inside the tenant cluster, a PVC is created in the management cluster, and then the created PV is connected to the virtual machine. A diagram showing a CSI plugin components installed on both inside and outside of a tenant Kubernetes cluster on a scheme of nested Kubernetes clusters and the mapping of persistent volumes it manages from the parent to the child Kubernetes cluster Cluster Autoscaler The Cluster Autoscaler is a versatile component that can work with various cloud APIs, and its integration with Cluster-API is just one of the available functions. For proper configuration, it requires access to two clusters: the tenant cluster, to track pods and determine the need for adding new nodes, and the managing Kubernetes cluster (management kubernetes cluster), where it interacts with the MachineDeployment resource and adjusts the number of replicas. Although Cluster Autoscaler usually runs inside the tenant Kubernetes cluster, in this situation, it is suggested to install it outside for the same reasons described before. This approach is simpler to maintain and more secure as it prevents users of tenant clusters from accessing the management API of the management cluster. A diagram showing a Cluster Autoscaler installed outside of a tenant Kubernetes cluster on a scheme of nested Kubernetes clusters Konnectivity There's another additional component I'd like to mention - Konnectivity. You will likely need it later on to get webhooks and the API aggregation layer working in your tenant Kubernetes cluster. This topic is covered in detail in one of my previous article. Unlike the components presented above, Kamaji allows you to easily enable Konnectivity and manage it as one of the core components of your tenant cluster, alongside kube-proxy and CoreDNS. Conclusion Now you have a fully functional Kubernetes cluster with the capability for dynamic scaling, automatic provisioning of volumes, and load balancers. Going forward, you might consider metrics and logs collection from your tenant clusters, but that goes beyond the scope of this article. Of course, all the components necessary for deploying a Kubernetes cluster can be packaged into a single Helm chart and deployed as a unified application. This is precisely how we organize the deployment of managed Kubernetes clusters with the click of a button on our open PaaS platform, Cozystack, where you can try all the technologies described in the article for free. View the full article
  21. Author: Andrei Kvapil (Ænix) Continuing our series of posts on how to build your own cloud using just the Kubernetes ecosystem. In the previous article, we explained how we prepare a basic Kubernetes distribution based on Talos Linux and Flux CD. In this article, we'll show you a few various virtualization technologies in Kubernetes and prepare everything need to run virtual machines in Kubernetes, primarily storage and networking. We will talk about technologies such as KubeVirt, LINSTOR, and Kube-OVN. But first, let's explain what virtual machines are needed for, and why can't you just use docker containers for building cloud? The reason is that containers do not provide a sufficient level of isolation. Although the situation improves year by year, we often encounter vulnerabilities that allow escaping the container sandbox and elevating privileges in the system. On the other hand, Kubernetes was not originally designed to be a multi-tenant system, meaning the basic usage pattern involves creating a separate Kubernetes cluster for every independent project and development team. Virtual machines are the primary means of isolating tenants from each other in a cloud environment. In virtual machines, users can execute code and programs with administrative privilege, but this doesn't affect other tenants or the environment itself. In other words, virtual machines allow to achieve hard multi-tenancy isolation, and run in environments where tenants do not trust each other. Virtualization technologies in Kubernetes There are several different technologies that bring virtualization into the Kubernetes world: KubeVirt and Kata Containers are the most popular ones. But you should know that they work differently. Kata Containers implements the CRI (Container Runtime Interface) and provides an additional level of isolation for standard containers by running them in virtual machines. But they work in a same single Kubernetes-cluster. A diagram showing how container isolation is ensured by running containers in virtual machines with Kata Containers KubeVirt allows running traditional virtual machines using the Kubernetes API. KubeVirt virtual machines are run as regular linux processes in containers. In other words, in KubeVirt, a container is used as a sandbox for running virtual machine (QEMU) processes. This can be clearly seen in the figure below, by looking at how live migration of virtual machines is implemented in KubeVirt. When migration is needed, the virtual machine moves from one container to another. A diagram showing live migration of a virtual machine from one container to another in KubeVirt There is also an alternative project - Virtink, which implements lightweight virtualization using Cloud-Hypervisor and is initially focused on running virtual Kubernetes clusters using the Cluster API. Considering our goals, we decided to use KubeVirt as the most popular project in this area. Besides we have extensive expertise and already made a lot of contributions to KubeVirt. KubeVirt is easy to install and allows you to run virtual machines out-of-the-box using containerDisk feature - this allows you to store and distribute VM images directly as OCI images from container image registry. Virtual machines with containerDisk are well suited for creating Kubernetes worker nodes and other VMs that do not require state persistence. For managing persistent data, KubeVirt offers a separate tool, Containerized Data Importer (CDI). It allows for cloning PVCs and populating them with data from base images. The CDI is necessary if you want to automatically provision persistent volumes for your virtual machines, and it is also required for the KubeVirt CSI Driver, which is used to handle persistent volumes claims from tenant Kubernetes clusters. But at first, you have to decide where and how you will store these data. Storage for Kubernetes VMs With the introduction of the CSI (Container Storage Interface), a wide range of technologies that integrate with Kubernetes has become available. In fact, KubeVirt fully utilizes the CSI interface, aligning the choice of storage for virtualization closely with the choice of storage for Kubernetes itself. However, there are nuances, which you need to consider. Unlike containers, which typically use a standard filesystem, block devices are more efficient for virtual machine. Although the CSI interface in Kubernetes allows the request of both types of volumes: filesystems and block devices, it's important to verify that your storage backend supports this. Using block devices for virtual machines eliminates the need for an additional abstraction layer, such as a filesystem, that makes it more performant and in most cases enables the use of the ReadWriteMany mode. This mode allows concurrent access to the volume from multiple nodes, which is a critical feature for enabling the live migration of virtual machines in KubeVirt. The storage system can be external or internal (in the case of hyper-converged infrastructure). Using external storage in many cases makes the whole system more stable, as your data is stored separately from compute nodes. A diagram showing external data storage communication with the compute nodes External storage solutions are often popular in enterprise systems because such storage is frequently provided by an external vendor, that takes care of its operations. The integration with Kubernetes involves only a small component installed in the cluster - the CSI driver. This driver is responsible for provisioning volumes in this storage and attaching them to pods run by Kubernetes. However, such storage solutions can also be implemented using purely open-source technologies. One of the popular solutions is TrueNAS powered by democratic-csi driver. A diagram showing local data storage running on the compute nodes On the other hand, hyper-converged systems are often implemented using local storage (when you do not need replication) and with software-defined storages, often installed directly in Kubernetes, such as Rook/Ceph, OpenEBS, Longhorn, LINSTOR, and others. A diagram showing clustered data storage running on the compute nodes A hyper-converged system has its advantages. For example, data locality: when your data is stored locally, access to such data is faster. But there are disadvantages as such a system is usually more difficult to manage and maintain. At Ænix, we wanted to provide a ready-to-use solution that could be used without the need to purchase and setup an additional external storage, and that was optimal in terms of speed and resource utilization. LINSTOR became that solution. The time-tested and industry-popular technologies such as LVM and ZFS as backend gives confidence that data is securely stored. DRBD-based replication is incredible fast and consumes a small amount of computing resources. For installing LINSTOR in Kubernetes, there is the Piraeus project, which already provides a ready-made block storage to use with KubeVirt. Note: In case you are using Talos Linux, as we described in the previous article, you will need to enable the necessary kernel modules in advance, and configure piraeus as described in the instruction. Networking for Kubernetes VMs Despite having the similar interface - CNI, The network architecture in Kubernetes is actually more complex and typically consists of many independent components that are not directly connected to each other. In fact, you can split Kubernetes networking into four layers, which are described below. Node Network (Data Center Network) The network through which nodes are interconnected with each other. This network is usually not managed by Kubernetes, but it is an important one because, without it, nothing would work. In practice, the bare metal infrastructure usually has more than one of such networks e.g. one for node-to-node communication, second for storage replication, third for external access, etc. A diagram showing the role of the node network (data center network) on the Kubernetes networking scheme Configuring the physical network interaction between nodes goes beyond the scope of this article, as in most situations, Kubernetes utilizes already existing network infrastructure. Pod Network This is the network provided by your CNI plugin. The task of the CNI plugin is to ensure transparent connectivity between all containers and nodes in the cluster. Most CNI plugins implement a flat network from which separate blocks of IP addresses are allocated for use on each node. A diagram showing the role of the pod network (CNI-plugin) on the Kubernetes network scheme In practice, your cluster can have several CNI plugins managed by Multus. This approach is often used in virtualization solutions based on KubeVirt - Rancher and OpenShift. The primary CNI plugin is used for integration with Kubernetes services, while additional CNI plugins are used to implement private networks (VPC) and integration with the physical networks of your data center. The default CNI-plugins can be used to connect bridges or physical interfaces. Additionally, there are specialized plugins such as macvtap-cni which are designed to provide more performance. One additional aspect to keep in mind when running virtual machines in Kubernetes is the need for IPAM (IP Address Management), especially for secondary interfaces provided by Multus. This is commonly managed by a DHCP server operating within your infrastructure. Additionally, the allocation of MAC addresses for virtual machines can be managed by Kubemacpool. Although in our platform, we decided to go another way and fully rely on Kube-OVN. This CNI plugin is based on OVN (Open Virtual Network) which was originally developed for OpenStack and it provides a complete network solution for virtual machines in Kubernetes, features Custom Resources for managing IPs and MAC addresses, supports live migration with preserving IP addresses between the nodes, and enables the creation of VPCs for physical network separation between tenants. In Kube-OVN you can assign separate subnets to an entire namespace or connect them as additional network interfaces using Multus. Services Network In addition to the CNI plugin, Kubernetes also has a services network, which is primarily needed for service discovery. Contrary to traditional virtual machines, Kubernetes is originally designed to run pods with a random address. And the services network provides a convenient abstraction (stable IP addresses and DNS names) that will always direct traffic to the correct pod. The same approach is also commonly used with virtual machines in clouds despite the fact that their IPs are usually static. A diagram showing the role of the services network (services network plugin) on the Kubernetes network scheme The implementation of the services network in Kubernetes is handled by the services network plugin, The standard implementation is called kube-proxy and is used in most clusters. But nowadays, this functionality might be provided as part of the CNI plugin. The most advanced implementation is offered by the Cilium project, which can be run in kube-proxy replacement mode. Cilium is based on the eBPF technology, which allows for efficient offloading of the Linux networking stack, thereby improving performance and security compared to traditional methods based on iptables. In practice, Cilium and Kube-OVN can be easily integrated to provide a unified solution that offers seamless, multi-tenant networking for virtual machines, as well as advanced network policies and combined services network functionality. External Traffic Load Balancer At this stage, you already have everything needed to run virtual machines in Kubernetes. But there is actually one more thing. You still need to access your services from outside your cluster, and an external load balancer will help you with organizing this. For bare metal Kubernetes clusters, there are several load balancers available: MetalLB, kube-vip, LoxiLB, also Cilium and Kube-OVN provides built-in implementation. The role of a external load balancer is to provide a stable address available externally and direct external traffic to the services network. The services network plugin will direct it to your pods and virtual machines as usual. A diagram showing the role of the external load balancer on the Kubernetes network scheme In most cases, setting up a load balancer on bare metal is achieved by creating floating IP address on the nodes within the cluster, and announce it externally using ARP/NDP or BGP protocols. After exploring various options, we decided that MetalLB is the simplest and most reliable solution, although we do not strictly enforce the use of only it. Another benefit is that in L2 mode, MetalLB speakers continuously check their neighbour's state by sending preforming liveness checks using a memberlist protocol. This enables failover that works independently of Kubernetes control-plane. Conclusion This concludes our overview of virtualization, storage, and networking in Kubernetes. The technologies mentioned here are available and already pre-configured on the Cozystack platform, where you can try them with no limitations. In the next article, I'll detail how, on top of this, you can implement the provisioning of fully functional Kubernetes clusters with just the click of a button. View the full article
  22. Author: Andrei Kvapil (Ænix) At Ænix, we have a deep affection for Kubernetes and dream that all modern technologies will soon start utilizing its remarkable patterns. Have you ever thought about building your own cloud? I bet you have. But is it possible to do this using only modern technologies and approaches, without leaving the cozy Kubernetes ecosystem? Our experience in developing Cozystack required us to delve deeply into it. You might argue that Kubernetes is not intended for this purpose and why not simply use OpenStack for bare metal servers and run Kubernetes inside it as intended. But by doing so, you would simply shift the responsibility from your hands to the hands of OpenStack administrators. This would add at least one more huge and complex system to your ecosystem. Why complicate things? - after all, Kubernetes already has everything needed to run tenant Kubernetes clusters at this point. I want to share with you our experience in developing a cloud platform based on Kubernetes, highlighting the open-source projects that we use ourselves and believe deserve your attention. In this series of articles, I will tell you our story about how we prepare managed Kubernetes from bare metal using only open-source technologies. Starting from the basic level of data center preparation, running virtual machines, isolating networks, setting up fault-tolerant storage to provisioning full-featured Kubernetes clusters with dynamic volume provisioning, load balancers, and autoscaling. With this article, I start a series consisting of several parts: Part 1: Preparing the groundwork for your cloud. Challenges faced during the preparation and operation of Kubernetes on bare metal and a ready-made recipe for provisioning infrastructure. Part 2: Networking, storage, and virtualization. How to turn Kubernetes into a tool for launching virtual machines and what is needed for this. Part 3: Cluster API and how to start provisioning Kubernetes clusters at the push of a button. How autoscaling works, dynamic provisioning of volumes, and load balancers. I will try to describe various technologies as independently as possible, but at the same time, I will share our experience and why we came to one solution or another. To begin with, let's understand the main advantage of Kubernetes and how it has changed the approach to using cloud resources. It is important to understand that the use of Kubernetes in the cloud and on bare metal differs. Kubernetes in the cloud When you operate Kubernetes in the cloud, you don't worry about persistent volumes, cloud load balancers, or the process of provisioning nodes. All of this is handled by your cloud provider, who accepts your requests in the form of Kubernetes objects. In other words, the server side is completely hidden from you, and you don't really want to know how exactly the cloud provider implements as it's not in your area of responsibility. A diagram showing cloud Kubernetes, with load balancing and storage done outside the cluster Kubernetes offers convenient abstractions that work the same everywhere, allowing you to deploy your application on any Kubernetes in any cloud. In the cloud, you very commonly have several separate entities: the Kubernetes control plane, virtual machines, persistent volumes, and load balancers as distinct entities. Using these entities, you can create highly dynamic environments. Thanks to Kubernetes, virtual machines are now only seen as a utility entity for utilizing cloud resources. You no longer store data inside virtual machines. You can delete all your virtual machines at any moment and recreate them without breaking your application. The Kubernetes control plane will continue to hold information about what should run in your cluster. The load balancer will keep sending traffic to your workload, simply changing the endpoint to send traffic to a new node. And your data will be safely stored in external persistent volumes provided by cloud. This approach is fundamental when using Kubernetes in clouds. The reason for it is quite obvious: the simpler the system, the more stable it is, and for this simplicity you go buying Kubernetes in the cloud. Kubernetes on bare metal Using Kubernetes in the clouds is really simple and convenient, which cannot be said about bare metal installations. In the bare metal world, Kubernetes, on the contrary, becomes unbearably complex. Firstly, because the entire network, backend storage, cloud balancers, etc. are usually run not outside, but inside your cluster. As result such a system is much more difficult to update and maintain. A diagram showing bare metal Kubernetes, with load balancing and storage done inside the cluster Judge for yourself: in the cloud, to update a node, you typically delete the virtual machine (or even use kubectl delete node) and you let your node management tooling create a new one, based on an immutable image. The new node will join the cluster and ”just work” as a node; following a very simple and commonly used pattern in the Kubernetes world. Many clusters order new virtual machines every few minutes, simply because they can use cheaper spot instances. However, when you have a physical server, you can't just delete and recreate it, firstly because it often runs some cluster services, stores data, and its update process is significantly more complicated. There are different approaches to solving this problem, ranging from in-place updates, as done by kubeadm, kubespray, and k3s, to full automation of provisioning physical nodes through Cluster API and Metal3. I like the hybrid approach offered by Talos Linux, where your entire system is described in a single configuration file. Most parameters of this file can be applied without rebooting or recreating the node, including the version of Kubernetes control-plane components. However, it still keeps the maximum declarative nature of Kubernetes. This approach minimizes unnecessary impact on cluster services when updating bare metal nodes. In most cases, you won't need to migrate your virtual machines and rebuild the cluster filesystem on minor updates. Preparing a base for your future cloud So, suppose you've decided to build your own cloud. To start somewhere, you need a base layer. You need to think not only about how you will install Kubernetes on your servers but also about how you will update and maintain it. Consider the fact that you will have to think about things like updating the kernel, installing necessary modules, as well packages and security patches. Now you have to think much more that you don't have to worry about when using a ready-made Kubernetes in the cloud. Of course you can use standard distributions like Ubuntu or Debian, or you can consider specialized ones like Flatcar Container Linux, Fedora Core, and Talos Linux. Each has its advantages and disadvantages. What about us? At Ænix, we use quite a few specific kernel modules like ZFS, DRBD, and OpenvSwitch, so we decided to go the route of forming a system image with all the necessary modules in advance. In this case, Talos Linux turned out to be the most convenient for us. For example, such a config is enough to build a system image with all the necessary kernel modules: arch: amd64 platform: metal secureboot: false version: v1.6.4 input: kernel: path: /usr/install/amd64/vmlinuz initramfs: path: /usr/install/amd64/initramfs.xz baseInstaller: imageRef: ghcr.io/siderolabs/installer:v1.6.4 systemExtensions: - imageRef: ghcr.io/siderolabs/amd-ucode:20240115 - imageRef: ghcr.io/siderolabs/amdgpu-firmware:20240115 - imageRef: ghcr.io/siderolabs/bnx2-bnx2x:20240115 - imageRef: ghcr.io/siderolabs/i915-ucode:20240115 - imageRef: ghcr.io/siderolabs/intel-ice-firmware:20240115 - imageRef: ghcr.io/siderolabs/intel-ucode:20231114 - imageRef: ghcr.io/siderolabs/qlogic-firmware:20240115 - imageRef: ghcr.io/siderolabs/drbd:9.2.6-v1.6.4 - imageRef: ghcr.io/siderolabs/zfs:2.1.14-v1.6.4 output: kind: installer outFormat: raw Then we use the docker command line tool to build an OS image: cat config.yaml | docker run --rm -i -v /dev:/dev --privileged "ghcr.io/siderolabs/imager:v1.6.4" - And as a result, we get a Docker container image with everything we need, which we can use to install Talos Linux on our servers. You can do the same; this image will contain all the necessary firmware and kernel modules. But the question arises, how do you deliver the freshly formed image to your nodes? I have been contemplating the idea of PXE booting for quite some time. For example, the Kubefarm project that I wrote an article about two years ago was entirely built using this approach. But unfortunately, it does help you to deploy your very first parent cluster that will hold the others. So now you have prepared a solution that will help you do this the same using PXE approach. Essentially, all you need to do is run temporary DHCP and PXE servers inside containers. Then your nodes will boot from your image, and you can use a simple Debian-flavored script to help you bootstrap your nodes. asciicast.svg The source for that talos-bootstrap script is available on GitHub. This script allows you to deploy Kubernetes on bare metal in five minutes and obtain a kubeconfig for accessing it. However, many unresolved issues still lie ahead. Delivering system components At this stage, you already have a Kubernetes cluster capable of running various workloads. However, it is not fully functional yet. In other words, you need to set up networking and storage, as well as install necessary cluster extensions, like KubeVirt to run virtual machines, as well the monitoring stack and other system-wide components. Traditionally, this is solved by installing Helm charts into your cluster. You can do this by running helm install commands locally, but this approach becomes inconvenient when you want to track updates, and if you have multiple clusters and you want to keep them uniform. In fact, there are plenty of ways to do this declaratively. To solve this, I recommend using best GitOps practices. I mean tools like ArgoCD and FluxCD. While ArgoCD is more convenient for dev purposes with its graphical interface and a central control plane, FluxCD, on the other hand, is better suited for creating Kubernetes distributions. With FluxCD, you can specify which charts with what parameters should be launched and describe dependencies. Then, FluxCD will take care of everything for you. It is suggested to perform a one-time installation of FluxCD in your newly created cluster and provide it with the configuration. This will install everything necessary, bringing the cluster to the expected state. By carrying out a single installation of FluxCD in your newly minted cluster and configuring it accordingly, you enable it to automatically deploy all the essentials. This will allow your cluster to upgrade itself into the desired state. For example, after installing our platform you'll see the next pre-configured Helm charts with system components: NAMESPACE NAME AGE READY STATUS cozy-cert-manager cert-manager 4m1s True Release reconciliation succeeded cozy-cert-manager cert-manager-issuers 4m1s True Release reconciliation succeeded cozy-cilium cilium 4m1s True Release reconciliation succeeded cozy-cluster-api capi-operator 4m1s True Release reconciliation succeeded cozy-cluster-api capi-providers 4m1s True Release reconciliation succeeded cozy-dashboard dashboard 4m1s True Release reconciliation succeeded cozy-fluxcd cozy-fluxcd 4m1s True Release reconciliation succeeded cozy-grafana-operator grafana-operator 4m1s True Release reconciliation succeeded cozy-kamaji kamaji 4m1s True Release reconciliation succeeded cozy-kubeovn kubeovn 4m1s True Release reconciliation succeeded cozy-kubevirt-cdi kubevirt-cdi 4m1s True Release reconciliation succeeded cozy-kubevirt-cdi kubevirt-cdi-operator 4m1s True Release reconciliation succeeded cozy-kubevirt kubevirt 4m1s True Release reconciliation succeeded cozy-kubevirt kubevirt-operator 4m1s True Release reconciliation succeeded cozy-linstor linstor 4m1s True Release reconciliation succeeded cozy-linstor piraeus-operator 4m1s True Release reconciliation succeeded cozy-mariadb-operator mariadb-operator 4m1s True Release reconciliation succeeded cozy-metallb metallb 4m1s True Release reconciliation succeeded cozy-monitoring monitoring 4m1s True Release reconciliation succeeded cozy-postgres-operator postgres-operator 4m1s True Release reconciliation succeeded cozy-rabbitmq-operator rabbitmq-operator 4m1s True Release reconciliation succeeded cozy-redis-operator redis-operator 4m1s True Release reconciliation succeeded cozy-telepresence telepresence 4m1s True Release reconciliation succeeded cozy-victoria-metrics-operator victoria-metrics-operator 4m1s True Release reconciliation succeeded Conclusion As a result, you achieve a highly repeatable environment that you can provide to anyone, knowing that it operates exactly as intended. This is actually what the Cozystack project does, which you can try out for yourself absolutely free. In the following articles, I will discuss how to prepare Kubernetes for running virtual machines and how to run Kubernetes clusters with the click of a button. Stay tuned, it'll be fun! View the full article
  23. Google Kubernetes Engine (GKE) offers two different ways to perform service discovery and DNS resolution: the in-cluster kube-dns functionality, and GCP managed Cloud DNS. Either approach can be combined with the performance-enhancing NodeLocal DNSCache add-on. New GKE Autopilot clusters use Cloud DNS as a fully managed DNS solution for your GKE Autopilot clusters without any configuration required on your part. But for GKE Standard clusters, you have the following DNS provider choices: Kube-dns (default) Cloud DNS - configured for either cluster-scope or VPC scope, and Install and run your own DNS (like Core DNS) In this blog, we break down the differences between the DNS providers for your GKE Standard clusters, and guide you to the best solution for your specific situation. Kube-DNS kube-dns is the default DNS provider for Standard GKE clusters, providing DNS resolution for services and pods within the cluster. If you select this option, GKE deploys the necessary kube-dns components such as Kube-dns pods, Kube-dns-autoscaler, Kube-dns configmap and Kube-dns service in the kube-system namespace. kube-dns is the default DNS provider for GKE Standard clusters and the only DNS provider for Autopilot clusters running versions earlier than 1.25.9-gke.400 and 1.26.4-gke.500. Kube-dns is a suitable solution for workloads with moderate DNS query volumes that have stringent DNS resolution latency requirements (e.g. under ~2-4ms). Kube-dns is able to provide low latency DNS resolution for all DNS queries as all the DNS resolutions are performed within the cluster. If you notice DNS timeouts or failed DNS resolutions for bursty workload traffic patterns when using kube-dns, consider scaling the number of kube-dns pods, and enabling NodeLocal DNS cache for the cluster. You can scale the number of kube-dns pods beforehand using Kube-dns autoscaler, and manually tuning it to the cluster's DNS traffic patterns. Using kube-dns along with Nodelocal DNS cache (discussed below) also reduces overhead on the kube-dns pods for DNS resolution of external services. While scaling up kube-dns and using NodeLocal DNS Cache(NLD) helps in the short term, it does not guarantee reliable DNS resolution during sudden traffic spikes. Hence migrating to Cloud DNS provides a more robust and long-term solution for improved reliability of DNS resolution consistently across varying DNS query volumes. You can update the DNS provider for your existing GKE Standard from kube-dns to Cloud DNS without requiring to re-create your existing cluster. For logging the DNS queries when using kube-dns, there is manual effort required in creating a new kube-dns debug pod with log-queries enabled. Cloud DNS Cloud DNS is a Google-managed service that is designed for high scalability and availability. In addition, Cloud DNS elastically scales to adapt to your DNS query volume, providing consistent and reliable DNS query resolution regardless of traffic volume. Cloud DNS simplifies your operations and minimizes operational overhead since it is a Google managed service and does not require you to maintain any additional infrastructure. Cloud DNS supports dns resolutions across the entire VPC, which is something not currently possible with kube-dns. Also, while using Multi Cluster Services (MCS) in GKE, Cloud DNS provides DNS resolution for services across your fleet of clusters. Unlike kube-dns, Google Cloud’s hosted DNS service Cloud DNS provides Pod and Service DNS resolution that auto-scales and offers a 100% service-level agreement, reducing DNS timeouts and providing consistent DNS resolution latency for heavy DNS workloads. Cloud DNS also integrates with Cloud Monitoring, giving you greater visibility into DNS queries for enhanced troubleshooting and analysis. The Cloud DNS controller automatically provisions DNS records for pods and services in Cloud DNS for ClusterIP, headless and external name services. You can configure Cloud DNS to provide GKE DNS resolution in either VPC or Cluster (the default) scope. With VPC scope, the DNS records are resolvable with the entire VPC. This is achieved with the private DNS zone that gets created automatically. With Cluster scope, the DNS records are resolvable only within the cluster. While Cloud DNS offers enhanced features, it does come with usage-based costs. You save on compute costs and overhead by removing kube-dns pods when using Cloud DNS. Considering the typical cluster size workload traffic patterns, Cloud DNS is usually more cost effective than running kube-dns You can migrate clusters from kube-dns to Cloud DNS cluster scope without downtime or changes to your applications. The reverse (migrating from Cloud DNS to kube-dns) is not a seamless operation. NodeLocal DNSCache NodeLocal DNSCache is a GKE add-on that you can run in addition to kube-dns and Cloud DNS. The node-local-dns pod gets deployed on the GKE nodes after the option has been enabled (subject to a node upgrade procedure). Nodelocal DNS Cache (NLD) helps to reduce the average DNS resolution times by resolving the DNS requests locally on the same nodes as the pods, and only forwards requests that it cannot resolve to the other DNS servers in the cluster. This is a great fit for clusters that have heavy internal DNS query loads. Enable NLD during maintenance windows. Please note that node pools must be re-created for this change to take effect. Final thoughts The choice of DNS provider for your GKE Standard cluster has implications for the performance and reliability, in addition to your operations and overall service discovery architecture. Hence, it is crucial for GKE Standard users to understand their DNS options taking into account their application and architecture objectives. Standard GKE clusters allow you to use either kube-dns or Cloud DNS as your DNS provider, allowing you to optimize for either low latency DNS resolution or a simple, scalable and reliable DNS solution for GKE Standard clusters. You can learn more about DNS for your GKE cluster from the GKE documentation . If you have any further questions, feel free to contact us. We thank the Google Cloud team member who contributed to the blog: Selin Goksu, Technical Solutions Developer, Google View the full article
  24. Autopilot mode for Google Kubernetes Engine (GKE) is our take on a fully managed, Pod-based Kubernetes platform. It provides category-defining features with a fully functional Kubernetes API with support for StatefulSet with block storage, GPU and other critical functionality that you don’t often found in nodeless/serverless-style Kubernetes offerings, while still offering a Pod-level SLA and a super-simple developer API. But until now, Autopilot, like other products in this category, did not offer the ability to temporarily burst CPU and memory resources beyond what was requested by the workload. I’m happy to announce that now, powered by the unique design of GKE Autopilot on Google Cloud, we are able to bring burstable workload support to GKE Autopilot. Bursting allows your Pod to temporarily utilize resources outside of those resources that it requests and is billed for. How does this work, and how can Autopilot offer burstable support, given the Pod-based model? The key is that in Autopilot mode we still group Pods together on Nodes. This is what powers several unique features of Autopilot such as our flexible Pod sizes. With this change, the capacity of your Pods is pooled, and Pods that set a limit higher than their requests can temporarily burst into this capacity (if it’s available). With the introduction of burstable support, we’re also introducing another groundbreaking change: 50m CPU Pods — that is, Pods as small as 1/20th of a vCPU. Until now, the smallest Pod we offered was ¼ of a vCPU (250m CPU) — five times bigger. Combined with burst, the door is now open to run high-density-yet-small workloads on Autopilot, without constraining each Pod to its resource requests. We’re also dropping the 250m CPU resource increment, so you can create any size of Pod you like between the minimum to the maximum size, for example, 59m CPU, 302m, 808m, 7682m etc (the memory to CPU ratio is automatically kept within a supported range, sizing up if needed). With these finer-grained increments, Vertical Pod Autoscaling now works even better, helping you tune your workload sizes automatically. If you want to add a little more automation to figuring out the right resource requirements, give Vertical Pod Autoscaling a try! Here’s what our customer Ubie had to say: “GKE Autopilot frees us from managing infrastructure so we can focus on developing and deploying our applications. It eliminates the challenges of node optimization, a critical benefit in our fast-paced startup environment. Autopilot's new bursting feature and lowered CPU minimums offer cost optimization and support multi-container architectures such as sidecars. We were already running many workloads in Autopilot mode, and with these new features, we're excited to migrate our remaining clusters to Autopilot mode.” - Jun Sakata, Head of Platform Engineering, Ubie A new home for high-density workloadsAutopilot is now a great place to run high-density workloads. Let’s say for example you have a multi-tenant architecture with many smaller replicas of a Pod, each running a website that receives relatively little traffic, but has the occasional spike. Combining the new burstable feature and the lower 50m CPU minimum, you can now deploy thousands of these Pods in a cost-effective manner (up to 20 per vCPU), while enabling burst so that when that traffic spike comes in, that Pod can temporarily burst into the pooled capacity of these replicas. What’s even better is that you don’t need to concern yourself with bin-packing these workloads, or the problem where you may have a large node that’s underutilized (for example, a 32 core node running a single 50m CPU Pod). Autopilot takes care of all of that for you, so you can focus on what’s key to your business: building great services. Calling all startups, students, and solopreneursEveryone needs to start somewhere, and if Autopilot wasn’t the perfect home for your workload before, we think it is now. With the 50m CPU minimum size, you can run individual containers in us-central1 for under $2/month each (50m CPU, 50MiB). And thanks to burst, these containers can use a little extra CPU when traffic spikes, so they’re not completely constrained to that low size. And if this workload isn’t mission-critical, you can even run it in Spot mode, where the price is even lower. In fact, thanks to GKE’s free tier, your costs in us-central1 can be as low as $30/month for small workloads (including production-grade load balancing) while tapping into the same world-class infrastructure that powers some of the biggest sites on the internet today. Importantly, if your startup grows, you can scale in-place without needing to migrate — since you’re already running on a production-grade Kubernetes cluster. So you can start small, while being confident that there is nothing limiting about your operating environment. We’re counting on your success as well, so good luck! And if you’re learning GKE or Kubernetes, this is a great platform to learn on. Nothing beats learning on real production systems — after all, that’s what you’ll be using on the job. With one free cluster per account, and all these new features, Autopilot is a fantastic learning environment. Plus, if you delete your Kubernetes workload and networking resources when you’re done for the day, any associated costs cease as well. When you’re ready to resume, just create those resources again, and you’re back at it. You can even persist state between learning sessions by deleting just the compute resources (and keeping the persistent disks). Don’t forget there’s a $300 free trial to cover your initial usage as well! Next steps: Learn about Pod bursting in GKE.Read more about the minimum and maximum resource requests in Autopilot mode.Learn how to set resource requirements automatically with VPA.Try GKE’s Autopilot mode for a workload-based API and simpler Day 2 ops with lower TCO.Going to Next ‘24? Check out session DEV224 to hear Ubie talk about how it uses burstable workloads in GKE. View the full article
  25. It’s Saturday night. You’re out to dinner with friends. Suddenly, a familiar tune emits from your pocket. Dread fills you as you fish your phone out of your pocket and unlock it. You tap the alert. Maybe it’s a lucky night and this is one alert you can just snooze or resolve. Maybe it’s a bad night, and the next step is you pulling your laptop from your bag — because you bring your laptop everywhere when you’re on-call — and trying to troubleshoot a problem in a crowded, noisy restaurant. The post How to Escape the 3 AM Page as a Kubernetes Site Reliability Engineer appeared first on Security Boulevard. View the full article
  • Forum Statistics

    43.2k
    Total Topics
    42.5k
    Total Posts
×
×
  • Create New...