Posted August 28, 2024Aug 28 Kubernetes has become the de-facto standard for container orchestration, providing powerful capabilities for deploying and managing stateless workloads. However, users running stateful applications on Kubernetes face unique challenges, especially in VMware environments. A key issue is that the virtual disks used by stateful apps can’t be attached to pods as easily as ephemeral storage. The volumes need to persist even when pods fail and restart. Overall, IT teams need to carefully evaluate challenges and constraints before running stateful workloads on Kubernetes clusters on VMware. Users who run containerized workloads on Kubernetes clusters on their vSphere environment use Amazon EKS Anywhere (EKS Anywhere). EKS Anywhere on vSphere does not include a default Container Storage Interface (CSI) driver. However, VMware offers a CSI driver with the vSphere Container Storage Plug-in for managing stateful workloads. The vSphere Container Storage Plug-in is a volume plug-in that runs in a native Kubernetes cluster deployed in vSphere and is responsible for provisioning persistent volumes on vSphere storage. An advantage of using this plug-in is snapshot capabilities, which is important for backup and disaster recovery (DR) scenarios. GitOps manages application and infrastructure deployment so that the system is described declaratively in a Git repository. It is an operational model that allows you to manage the state of multiple Kubernetes clusters by using the best practices of version control, immutable artifacts, and automation. Flux is a GitOps tool that can be used to automate the deployment of applications on Kubernetes as well as manage EKS Anywhere clusters. It works by continuously monitoring the state of a Git repository and applying changes to a cluster. In this post we demonstrate the process of using GitOps to deploy and manage stateful workloads on your EKS Anywhere cluster on your vSphere environment with vSphere CSI driver. In this setup, we start with creating vCenter configuration secrets, which are necessary to create storage with vCenter. Then, we install External Secrets Operator to query access keys from AWS Secrets Manager, which are necessary for setting up the vSphere CSI Driver. For this demonstration, we are using Secrets Manager to illustrate the approach, and users can also use any other vault implementation. Next, we configure GitOps through Flux to deploy the vSphere CSI driver manifests from a git repository. Finally, we deploy a stateful workload to validate the backup and restore capabilities of persistent volumes on vCenter storage through our vSphere CSI driver. The outline of this is shown in the preceding diagram. Prerequisites Make sure the following prerequisites are complete: A Linux-based host machine using an Amazon Elastic Compute Cloud (Amazon EC2) instance, an AWS Cloud9 instance, or a local machine with access to your AWS account. Configure admin access to the EKS Anywhere cluster from the host machine. Configure IAM Roles for Service Account (IRSA) on the EKS Anywhere cluster. Install the following tools on the host machine from the previous two steps: AWS Command Line Interface (AWS CLI) version 2 to interact with AWS services using CLI commands. Helm to deploy and manage Kubernetes applications. kubectl to communicate with the Kubernetes API server. eksctl and eksctl anywhere to create and manage the EKS Anywhere cluster. Git to clone the necessary source repository from GitHub. curl to make HTTP requests. envsubst to substitute environment variables in shell. Flux for creating the Git repository source. Create the vCenter configuration secrets The first step in our setup process is to create the necessary vCenter configuration secrets. Let’s export a few vCenter details to the environment variable export EKSA_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text) export EKSA_OIDC_PROVIDER=<value of $ISSUER_HOSTPATH as configured in IRSA setup> export EKSA_ES_SERVICE_ACCOUNT="external-secrets-sa" # Comments reflect example values from the vsphere inventory above - Set Env variables to reflect your vsphere cluster's environment export VSPHERE_USERNAME=<Your VCenter Admin Username> export VSPHERE_PASSWORD=<Your VCenter Admin Password> export VCENTER_DOMAIN_NAME=<Your Vcenter Server Domain> # sc2-rdops-vm06-dhcp-215-129.eng.vmware.com export VSPHERE_DATA_CENTER_NAME=<Your Data Center name> # datacenter export VSPHERE_CLUSTER_NAME=<Your Cluster Name> # vSAN-cluster export VCENTER_NAME=<Your Vcenter Server # sc2-rdops-vm06-dhcp-215-129 export VSPHERE_IP_ADDRESS=$(getent hosts $VCENTER_DOMAIN_NAME | awk '{ print $1 }') Next, let’s set up the configuration secrets that are loaded from Secrets Manager: cat << 'EOF' >> vmwareconf.txt global: port: 443 insecureFlag: true # vcenter section vcenter: $VCENTER_NAME: server: $VSPHERE_IP_ADDRESS user: $VSPHERE_USERNAME password: $VSPHERE_PASSWORD datacenters: - $VSPHERE_DATA_CENTER_NAME EOF cat << 'EOF' >> csi-vsphere-vars.txt [Global] insecure-flag = "true" port = "443" [VirtualCenter "$VSPHERE_IP_ADDRESS"] cluster-id = "$VSPHERE_CLUSTER_NAME" user = "$VSPHERE_USERNAME" password = "$VSPHERE_PASSWORD" datacenters = "$VSPHERE_DATA_CENTER_NAME" EOF Next, let’s load the configuration secrets to Secrets Manager: export VSPHERE_CONTROLLER_SECRET_ARN=$(aws secretsmanager create-secret \ --name vsphere.conf \ --secret-string "$(envsubst < vmwareconf.txt)" | jq -r '.ARN') export CSI_DRIVER_SECRET_ARN=$(aws secretsmanager create-secret \ --name csi-vsphere.conf \ --secret-string "$(envsubst < csi-vsphere-vars.txt)" | jq -r '.ARN') Installing external secrets operator The next step in our setup process is to setup external secrets to securely access the vCenter Cloud Controller Manager and CSI Driver configuration secrets from Secrets Manager. First, let’s start with creating an AWS Identity and Access Management (IAM) policy and role to allow the cluster to access only the Secrets Manager secrets we created in the previous step: cat << EOF > vmware-csi-secrets-reader-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "secretsmanager:ListSecrets", "secretsmanager:GetSecretValue" ], "Resource": [ "$VSPHERE_CONTROLLER_SECRET_ARN", "$CSI_DRIVER_SECRET_ARN" ] } ] } EOF aws iam create-policy \ --policy-name vmware-csi-secrets-reader \ --policy-document file://vmware-csi-secrets-reader-policy.json export POLICY_ARN=$(aws iam list-policies \ --query 'Policies[?PolicyName==`vmware-csi-secrets-reader`].Arn' \ --output text) cat << EOF > secrets-manager-trust-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::$EKSA_ACCOUNT_ID:oidc-provider/$EKSA_OIDC_PROVIDER" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringLike": { "$EKSA_OIDC_PROVIDER:sub": [ "system:serviceaccount:kube-system:$EKSA_ES_SERVICE_ACCOUNT", "system:serviceaccount:vmware-system-csi:$EKSA_ES_SERVICE_ACCOUNT" ] } } } ] } EOF export ES_ROLEARN=$(aws iam create-role --role-name ${EKSA_ES_SERVICE_ACCOUNT}-role \ --assume-role-policy-document file://secrets-manager-trust-policy.json \ --query Role.Arn --output text) aws iam attach-role-policy --role-name ${EKSA_ES_SERVICE_ACCOUNT}-role \ --policy-arn $POLICY_ARN Next, we deploy external-secrets through Helm to sync secrets between Secrets Manager and EKS Anywhere cluster. helm repo add external-secrets https://charts.external-secrets.io helm install external-secrets \ external-secrets/external-secrets \ -n external-secrets \ --create-namespace Next, let’s verify if external-secrets has been successfully deployed and all pods are ready: > kubectl get pods -n external-secrets NAME READY STATUS RESTARTS AGE pod/external-secrets-5477599d89-7spkg 1/1 Running 0 100s pod/external-secrets-cert-controller-6cc64794fc-5czqj 1/1 Running 0 100s pod/external-secrets-webhook-55555fc4fd-mncm5 1/1 Running 0 100s To use IRSA for our secrets retrieval we need a service account in each namespace using external-secrets to assume that role. Since one of the service accounts resides in the vmware-system-csi namespace, we also create that now: kubectl create ns vmware-system-csi cat << EOF | kubectl apply -f - apiVersion: v1 kind: ServiceAccount metadata: name: ${EKSA_ES_SERVICE_ACCOUNT} namespace: kube-system annotations: eks.amazonaws.com/role-arn: ${ES_ROLEARN} eks.amazonaws.com/audience: "sts.amazonaws.com" eks.amazonaws.com/sts-regional-endpoints: "true" eks.amazonaws.com/token-expiration: "86400" --- apiVersion: v1 kind: ServiceAccount metadata: name: ${EKSA_ES_SERVICE_ACCOUNT} namespace: vmware-system-csi annotations: eks.amazonaws.com/role-arn: ${ES_ROLEARN} eks.amazonaws.com/audience: "sts.amazonaws.com" eks.amazonaws.com/sts-regional-endpoints: "true" eks.amazonaws.com/token-expiration: "86400" --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: ${EKSA_ES_SERVICE_ACCOUNT}-cluster-role rules: - apiGroups: - "" resources: - nodes - nodes/proxy - services - endpoints - pods verbs: - get - list - watch - apiGroups: - extensions resources: - ingresses verbs: - get - list - watch - nonResourceURLs: - /metrics verbs: - get --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: ${EKSA_ES_SERVICE_ACCOUNT}-role-binding roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: ${EKSA_ES_SERVICE_ACCOUNT}-role subjects: - kind: ServiceAccount name: ${EKSA_ES_SERVICE_ACCOUNT} namespace: kube-system - kind: ServiceAccount name: ${EKSA_ES_SERVICE_ACCOUNT} namespace: vmware-system-csi EOF Next, let’s create the ClusterSecretStore which is a scoped SecretStore that can be referenced by all ExternalSecrets from all namespaces cat << EOF | kubectl apply -f - apiVersion: external-secrets.io/v1beta1 kind: ClusterSecretStore metadata: name: eksa-secret-store spec: provider: aws: # set secretStore provider to AWS. service: SecretsManager # Configure service to be Secrets Manager region: us-west-2 # Region where the secret is. auth: jwt: serviceAccountRef: name: ${EKSA_ES_SERVICE_ACCOUNT} EOF Verify the ClusterSecretStore status using the following command: > kubectl get clustersecretstore eksa-secret-store NAME AGE STATUS CAPABILITIES READY eksa-secret-store 2m38s Valid ReadWrite True Configure GitOps with Flux to install Cloud Controller Manager and vSphere CSI Driver Note that you can skip flux install step can be skipped if you are already using GitOps enabled EKS Anywhere cluster, the EKS Anywhere installation process installs Flux on your behalf. We use GitOps sync through Flux to handle the deployment of the CSI driver into our EKS Anywhere cluster. Deploy Flux in your EKS Anywhere cluster using the following command flux install flux create source git vmware-csi \ --url=https://github.com/aws-samples/containers-blog-maelstrom \ --branch=main flux create kustomization csi-driver-main \ --source=Gitrepository/vmware-csi \ --path="./vmware-csi-driver-gitops" \ --prune=true \ --interval=1m --namespace=flux-system Verify that the CSI driver installation is successful Check that the cloud controller manager and CSI driver were installed successfully and that the Storage class was created > kubectl get pods -n kube-system -l name=vsphere-cloud-controller-manager NAME READY STATUS RESTARTS AGE pod/vsphere-cloud-controller-manager-5mcm7 1/1 Running 0 5m58s pod/vsphere-cloud-controller-manager-9zqgq 1/1 Running 0 5m58s pod/vsphere-cloud-controller-manager-rhgm9 1/1 Running 0 5m58s > kubectl get pods -n vmware-system-csi NAME READY STATUS RESTARTS AGE pod/vsphere-csi-controller-84bb459bd5-8llmm 7/7 Running 0 3m33s pod/vsphere-csi-controller-84bb459bd5-mh922 7/7 Running 0 3m33s pod/vsphere-csi-controller-84bb459bd5-vrfkw 7/7 Running 0 3m33s pod/vsphere-csi-node-g6jfr 3/3 Running 1 (3m20s ago) 3m33s pod/vsphere-csi-node-gmlpd 3/3 Running 2 (3m19s ago) 3m33s pod/vsphere-csi-node-lmfvq 3/3 Running 1 (3m21s ago) 3m33s pod/vsphere-csi-node-s4cdt 3/3 Running 2 (3m20s ago) 3m33s pod/vsphere-csi-node-xqj7z 3/3 Running 1 (3m20s ago) 3m33s pod/vsphere-csi-node-z6rbp 3/3 Running 2 (3m20s ago) 3m33s > kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE vmware-sc (default) csi.vsphere.vmware.com Delete Immediate false 4h42m Verify GitOps deployment of sample stateful workload along with backup and restore Finally, we validate our GitOps setup, which deployed a sample stateful workload along with validating the backup and restore capabilities with our deployed vSphere CSI driver. This sample stateful workload has deployed a sample app, which created a volume and then created a snapshot. > kubectl get pods -l job-name=app NAME READY STATUS RESTARTS AGE pod/app-4677h 1/1 Running 0 3m1s For managing storage for stateful workloads, the vSphere CSI driver uses two API resources: PersistentVolume (PV) and PersistentVolumeClaim (PVC) of PersistentVolume subsystem. A PVC is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and memory). Claims can request specific size and access modes (such as they can be mounted ReadWriteOnce, ReadOnlyMany, ReadWriteMany, or ReadWriteOncePod, see AccessModes). A PV is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like volumes, but they have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system. Run the following commands to see the pod of sample stateful workload along with the PVC, which is bound to our vmware-sc storage class and a PV dynamically provisioned with vCenter storage: > kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-5ec73b4b-3d1e-41a6-8cac-5502094462eb 4Gi RWO Delete Bound default/vmware-csi-claim vmware-sc 44s > kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE vmware-csi-claim Bound pvc-5ec73b4b-3d1e-41a6-8cac-5502094462eb 4Gi RWO vmware-sc 114s Figure 2: vCenter storage UI showing the test workload’s volume Similar to how API resources PersistentVolume and PersistentVolumeClaim are used to provision volumes, VolumeSnapshotContent and VolumeSnapshot API resources are provided to create volume snapshots. A VolumeSnapshotContent is a snapshot taken from a volume in the cluster and it is a resource in the cluster just like a PV is a cluster resource. A VolumeSnapshot is a request for a snapshot and it is similar to a PersistentVolumeClaim. Next, let’s check on the VolumeSnapshot, which is a point-in-time snapshot of our volume, which can be used for restoring the storage for the stateful workload > kubectl get volumesnapshot NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE vmware-csi-volume-snapshot true vmware-csi-claim 4Gi vmware-csi-snapshotclass snapcontent-6034c162-e256-4557-b0b9-08f545d723a6 3m35s 4m8s Finally, let’s validate the restore operation on the stateful workload by creating a workload that uses the created point-in-time snapshot using the following commands: > kubectl get pods -l kustomize.toolkit.fluxcd.io/name=storage-tester NAME READY STATUS RESTARTS AGE app-restore 1/1 Running 0 4m6s With that restored pod created we can now see that an additional volume has been created, both in our vCenter UI and on our cluster by running the following command again: > kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-5ec73b4b-3d1e-41a6-8cac-5502094462eb 4Gi RWO Delete Bound default/vmware-csi-claim vmware-sc 13m pvc-b0f57d0c-ca93-45a4-9ea3-b7adc1c0f096 4Gi RWO Delete Bound default/restored-vmware-csi-claim vmware-sc 2s Figure 3: vCenter storage UI showing the snapshot restore test workload’s volume Cleaning up To avoid incurring future charges, clean up the EKS Anywhere cluster resources and AWS resources created during the lab: # clean up EKSA resources flux delete kustomization -n flux-system csi-driver-main kubectl delete sc vmware-sc kubectl delete clustersecretstore eksa-secret-store kubectl delete clusterrolebinding ${EKSA_ES_SERVICE_ACCOUNT}-role-binding kubectl delete sa ${EKSA_ES_SERVICE_ACCOUNT} -n kube-system kubectl delete sa ${EKSA_ES_SERVICE_ACCOUNT} -n vmware-system-csi kubectl delete ns vmware-system-csi kubectl delete secret snapshot-webhook-certs -n kube-system helm uninstall external-secrets -n external-secrets rm -fv ./vmwareconf.txt ./csi-vsphere-vars.txt ./vmware-csi-secrets-reader-policy.json ./secrets-manager-trust-policy.json kubectl delete ns external-secrets flux uninstall # clean up AWS resources aws iam detach-role-policy --role-name ${EKSA_ES_SERVICE_ACCOUNT}-role \ --policy-arn $POLICY_ARN aws iam delete-policy \ --policy-arn $POLICY_ARN aws iam delete-role \ --role-name ${EKSA_ES_SERVICE_ACCOUNT}-role aws secretsmanager delete-secret \ --region ${AWS_REGION} \ --secret-id $VSPHERE_CONTROLLER_SECRET_ARN \ --recovery-window-in-days=7 aws secretsmanager delete-secret \ --region ${AWS_REGION} \ --secret-id $CSI_DRIVER_SECRET_ARN \ --recovery-window-in-days=7 Conclusion In this post, we demonstrated the process of using GitOps to deploy a vSphere CSI driver on your EKS Anywhere cluster on your vSphere environment. Furthermore, we demonstrated the process of deploying a stateful workload to our EKS Anywhere cluster using vSphere CSI driver. Then, we demonstrated the underlying process of persistent volume claim and persistent volume creation with stateful workload, which dynamically created a storage on vCenter storage. Finally, we backed up the volume by creating a point-in-time snapshot of the volume and performed a restore operation of the stateful workload with the created point-in-time snapshot. Users looking to run stateful workloads on EKS Anywhere clusters on vSphere can seamlessly follow this approach to operate stateful workloads at scale. To learn more about managing your EKS Anywhere environment, check the following resources: EKS Anywhere curated package management Blue/Green Kubernetes upgrades for Amazon EKS Anywhere using Flux Monitoring Amazon EKS Anywhere using Amazon Managed Service for Prometheus and Amazon Managed Grafana View the full article
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.