Search the Community
Showing results for tags 'hcp nomad'.
-
hcp vault Running Vault on HashiCorp Nomad, Part 1
Hashicorp posted a topic in Infrastructure-as-Code
Vault is a secrets management platform that provides encrypted storage for long lived secrets, identity brokerage using ephemeral credentials, and encryption as a service. Unless you’re using HashiCorp Cloud Platform to host Vault (which is always recommended if you can support it), deploying and running Vault clusters will likely be a manual process. Any time a server needs to be restarted, an engineer would need to login and restart the service. This is what orchestrators like HashiCorp Nomad and Kubernetes were built to automate. While Kubernetes has a wide array of components to manage additional use cases, Nomad is mainly focused on scheduling and cluster management. It’s simple to run, lightweight, and supports running VMs, containers, raw executables, JAR files, Qemu workloads, and more with custom task driver plugins. By running Vault as a Nomad job (Nomad’s term for workloads), operators can manage and schedule Vault servers with a low-complexity architecture. This post shows how to deploy and configure Vault servers on Nomad using HCP Terraform. The secrets consumption will be done using the Nomad and Vault CLI’s, respectively, to show the underlying workflows. The Terraform code will be split in two, with separate configuration for the infrastructure and the Vault deployment. This is done to manage the states for these workspaces separately and share dependency outputs between them. Deployment architecture This deployment architecture requires five virtual machines (VMs) — one is the Nomad server, and the other four are the Nomad clients that run the Vault servers, including a backup server for Vault. These VMs will be deployed to Amazon EC2 instances. The VMs will all live in the same virtual private cloud (VPC) subnet. HCP Terraform and directory setup Because this approach splits the architecture into multiple workspaces, you need to configure remote backends for each HCP Terraform workspace so that output dependencies can be shared between them. To create these workspaces, create a directory structure that contains a folder for each workspace. The directory structure should look like this: ├── 1-nomad-infrastructure ├── 2-nomad-configuration ├── 3-nomad-example-job-deployment 3 directories The remote backend is HCP Terraform. To create the remote backends, create a file called backend.tf in each of the directories. Here is a shell script that will create the directory structure and write the relevant backend.tf files in all of the directories. Networking for the Nomad cluster To create the infrastructure for Nomad, navigate to the 1-nomad-infrastructure directory. First, set up your AWS Terraform provider. Here is the provider.tf code. Once the provider is configured, you’re ready to deploy a VPC and a subnet. To do this, there is another file in the same directory, called network.tf, which contains the code below: module "vpc" { source = "terraform-aws-modules/vpc/aws" name = "my-vpc" cidr = "10.0.0.0/16" azs = ["eu-west-1a"] private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"] public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"] enable_nat_gateway = true enable_vpn_gateway = false enable_dns_support = true enable_dns_hostnames = true tags = { Terraform = "true" Environment = "dev" } }This code deploys all the resources required for a fully functional network, including resources for a working AWS VPC, associated subnets, and NAT gateways. It uses the community Terraform module called the AWS VPC Terraform module, available on the Terraform Registry. Configuration of Nomad servers Before you can write the Terraform code to deploy the five VMs, you need to write some shell scripts to configure the servers during their deployment as a prerequisite. The first is for the Nomad server called nomad-server.sh: #! /bin/bash -e # Install Nomad sudo apt-get update && \ sudo apt-get install wget gpg coreutils -y wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list sudo apt-get update && sudo apt-get install nomad -y # Create Nomad directory. mkdir -p /etc/nomad.d # Nomad configuration files cat < /etc/nomad.d/nomad.hcl log_level = "DEBUG" data_dir = "/etc/nomad.d/data" server { enabled = true bootstrap_expect = ${NOMAD_SERVER_COUNT} server_join { retry_join = ["provider=aws tag_value=${NOMAD_SERVER_TAG} tag_key=${NOMAD_SERVER_TAG_KEY}"] } } autopilot { cleanup_dead_servers = true last_contact_threshold = "200ms" max_trailing_logs = 250 server_stabilization_time = "10s" enable_redundancy_zones = false disable_upgrade_migration = false enable_custom_upgrades = false } EOF cat < /etc/nomad.d/acl.hcl acl = { enabled = true } EOF systemctl enable nomad systemctl restart nomadThis script does a number of things to configure the Nomad server: Installs the Nomad binary. Creates the Nomad directory that contains everything it needs to function. Creates a Nomad configuration file for the server and places it in the Nomad directory created in step 2. This configuration uses a feature called cloud auto-join that looks for pre-specified tags on the VM and automatically joins any VMs with these tags to a Nomad cluster. Enables access control lists (ACLs) for Nomad. Starts the Nomad service. This script runs on the Nomad server VM when deployed using cloud-init. Notice the script contains three variables for cloud auto-join: ${NOMAD_SERVER_COUNT}, ${NOMAD_SERVER_TAG_KEY}, and ${NOMAD_SERVER_TAG}. The values of these variables will be rendered by Terraform. To deploy this VM and run this script, use the file named compute.tf, which is in the directory mentioned above: data "aws_ami" "ubuntu" { most_recent = true filter { name = "name" values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"] } filter { name = "virtualization-type" values = ["hvm"] } owners = ["099720109477"] # Canonical }The code above uses a data source to locate the Amazon Machine Image (AMI) used to deploy the VM instance. The code below creates a security group attached to the VM, allowing SSH access to it on port 22: resource "aws_security_group" "ssh" { vpc_id = module.vpc.vpc_id name = "allow_ssh" ingress { from_port = 22 protocol = "tcp" to_port = 22 cidr_blocks = [ "0.0.0.0/0" ] } tags = { Name = "allow_ssh" } }The following code creates another security group that allows access to Nomad’s default port, which is port 4646. This is attached to the Nomad server VM: resource "aws_security_group" "nomad" { vpc_id = module.vpc.vpc_id name = "nomad_port" ingress { from_port = 4646 protocol = "tcp" to_port = 4648 cidr_blocks = [ "0.0.0.0/0" ] } tags = { Name = "nomad" } }The next piece of code creates a security group to allow egress connections outside of the network: resource "aws_security_group" "egress" { vpc_id = module.vpc.vpc_id name = "egress" egress { from_port = 0 protocol = "-1" to_port = 0 cidr_blocks = [ "0.0.0.0/0" ] } tags = { Name = "egress" } }The code below creates the final security group of this walkthrough that allows access to Vault on its default port of 8200. This will be attached to the Nomad clients that will run Vault: resource "aws_security_group" "vault" { vpc_id = module.vpc.vpc_id name = "vault" ingress { from_port = 8200 protocol = "tcp" to_port = 8201 cidr_blocks = [ "0.0.0.0/0" ] } tags = { Name = "vault" } }The code below creates an IP address for the Nomad server in the first resource and associates it with the Nomad server VM in the second. Terraform will not perform the association until the Nomad server VM has been deployed. The IP address is deployed first because it is used in the Nomad config file that is deployed as part of the VM provisioning to enable the OIDC discovery URL on the Nomad server. resource "aws_eip" "nomad_server" { tags = { Name = "Nomad Server" } } resource "aws_eip_association" "nomad_server" { instance_id = aws_instance.nomad_servers.id allocation_id = aws_eip.nomad_server.id }The other prerequisite for deploying the VM for the Nomad server is an SSH key pair, which enables authentication to the VM when connecting via SSH: resource "aws_key_pair" "deployer" { key_name = "deployer-key" public_key = file(var.ssh_key) } The code below deploys the VM for the Nomad server: resource "aws_instance" "nomad_servers" { ami = data.aws_ami.ubuntu.id instance_type = "t3.micro" subnet_id = module.vpc.public_subnets.0 key_name = aws_key_pair.deployer.key_name user_data = templatefile("./servers.sh", { NOMAD_SERVER_TAG = "true" NOMAD_SERVER_TAG_KEY = "nomad_server" NOMAD_SERVER_COUNT = 1 NOMAD_ADDR = aws_eip.nomad_server.public_ip }) vpc_security_group_ids = [ aws_security_group.ssh.id, aws_security_group.egress.id, aws_security_group.nomad.id ] lifecycle { ignore_changes = [ user_data, ami ] } tags = { Name = "Nomad Server" nomad_server = true } }Some points to note about this code: This resource usescloud-init to deploy the script to the VM that installs the dependent packages and configure the server for Nomad. The script is rendered using the templatefile function and populates the variable values in the script template with the values specified in the above resource: NOMAD_SERVER_TAG: for cloud auto-join NOMAD_SERVER_TAG_KEY: for cloud auto-join NOMAD_SERVER_COUNT: to specify how many servers Nomad expects to join the cluster NOMAD_ADDR: to configure Nomad’s OIDC discovery URL The data source used to obtain the AMI will always fetch the latest version. This means the VM could be deployed more if desired. The lifecycle block ignores changes specifically related to the AMI ID. It associates the security groups created above to the VM. It also adds the SSH key pair to the VM to aid in SSH authentication. This is useful for troubleshooting. Configuring Nomad clients for the Vault servers To deploy the Nomad clients for Vault, you take a similar approach to deploying the server. The main difference is the cloud-init script and the number of servers deployed. The script below (client.sh) is used to configure each Nomad client: #! /bin/bash -e # Install the CNI Plugins curl -L https://github.com/containernetworking/plugins/releases/download/v0.9.1/cni-plugins-linux-amd64-v0.9.1.tgz -o /tmp/cni.tgz mkdir -p /opt/cni/bin tar -C /opt/cni/bin -xzf /tmp/cni.tgz # Install Nomad sudo apt-get update && \ sudo apt-get install wget gpg coreutils -y wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list sudo apt-get update && sudo apt-get install nomad -y # Create Nomad directory. mkdir -p /etc/nomad.d # Create Vault directory. mkdir -p /etc/vault.d # Add Docker's official GPG key: sudo apt-get update sudo apt-get install ca-certificates curl -y sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update # Install Docker sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y # Install Java sudo apt install default-jre -y # Nomad configuration files cat < /etc/nomad.d/nomad.hcl log_level = "DEBUG" data_dir = "/etc/nomad.d/data" client { enabled = true node_pool = "vault-servers" node_class = "vault-servers" server_join { retry_join = ["${NOMAD_SERVERS_ADDR}"] } host_volume "/etc/vault.d" { path = "/etc/vault.d" read_only = false } } plugin "docker" { config { allow_privileged = true } } autopilot { cleanup_dead_servers = true last_contact_threshold = "200ms" max_trailing_logs = 250 server_stabilization_time = "10s" enable_redundancy_zones = false disable_upgrade_migration = false enable_custom_upgrades = false } EOF cat < /etc/nomad.d/acl.hcl acl = { enabled = true } EOF systemctl enable nomad systemctl restart nomadThe script above is similar to that of the Nomad server. One key difference is the Nomad client configuration file, which includes the following configuration items: Node pool: This Nomad feature can group pieces of compute infrastructure together. In this case, you want dedicated servers to run your Vault cluster to reduce the security blast radius. Grouping these servers lets you specify what node pool the Vault servers should run on and create policies around this to prevent other Nomad jobs from being deployed to these client nodes. Host volume: Vault can store encrypted secrets that other authorized Nomad jobs can retrieve. This means that Vault is a stateful workload that requires persistent storage. Host volumes expose a volume on the VM to Nomad and lets you mount the volume to a Nomad job. This means that if a job is restarted, it can still access its data. Docker plugin: The Docker plugin is enabled because Vault will be run in a container as a Nomad job. This has the potential to ease the upgrade paths by simply changing the tag on the image used. Before deploying the Nomad clients, you need to perform a quick health check on the Nomad server to ensure it is available for clients to join it. To check this, use TerraCurl to make an API call to the Nomad server to check its status: resource "terracurl_request" "nomad_status" { method = "GET" name = "nomad_status" response_codes = [200] url = "http://${aws_eip.nomad_server.public_ip}:4646/v1/status/leader" max_retry = 4 retry_interval = 10 depends_on = [ aws_instance.nomad_servers, aws_eip_association.nomad_server ] }This checks that Nomad has an elected leader in the cluster and expects a 200 response. If it does not get the desired response, TerraCurl will retry every 10 seconds for a maximum of 4 retries. This prevents potential race conditions between the Nomad server and the clients’ provisioning. Now you’re ready to deploy the Nomad client VMs. This is similar to the server deployed before, with a few key differences: Vault requires more compute power so the instance type is bigger. It uses the count feature because you need three Vault nodes. The script rendered by the templatefile function needs only one variable value this time (the Nomad server address). resource "aws_instance" "nomad_clients" { count = 3 ami = data.aws_ami.ubuntu.id instance_type = "t3.medium" subnet_id = module.vpc.public_subnets.0 key_name = aws_key_pair.deployer.key_name associate_public_ip_address = true user_data = templatefile("./clients.sh", { NOMAD_SERVERS_ADDR = "${aws_instance.nomad_servers.private_ip}" }) vpc_security_group_ids = [ aws_security_group.ssh.id, aws_security_group.egress.id, aws_security_group.nomad.id, aws_security_group.vault.id ] tags = { Name = "Vault on Nomad Client ${count.index + 1}" nomad_server = false } lifecycle { ignore_changes = [ user_data, ami ] } depends_on = [ terracurl_request.nomad_status ] }Configuration of the Nomad client for the Vault backup server The next piece of infrastructure to deploy is the VM used as the Vault backup server. The server makes backups of the Vault cluster. It’s best practice to store backups away from the Vault cluster, so create a separate node pool for the backup server. Run the vault-backup-server.sh script below, which is located in the same directory you’ve been working in so far: #! /bin/bash -e # Install the CNI Plugins curl -L https://github.com/containernetworking/plugins/releases/download/v0.9.1/cni-plugins-linux-amd64-v0.9.1.tgz -o /tmp/cni.tgz mkdir -p /opt/cni/bin tar -C /opt/cni/bin -xzf /tmp/cni.tgz # Install Nomad sudo apt-get update && \ sudo apt-get install wget gpg coreutils -y wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list sudo apt-get update && sudo apt-get install nomad -y # Create Nomad directory. mkdir -p /etc/nomad.d # Install Vault sudo apt-get install vault -y # Create Vault directory. mkdir -p /etc/vault.d # Add Docker's official GPG key: sudo apt-get update sudo apt-get install ca-certificates curl -y sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update # Install Docker sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -y # Install Java sudo apt install default-jre -y # Nomad configuration files cat < /etc/nomad.d/nomad.hcl log_level = "DEBUG" data_dir = "/etc/nomad.d/data" client { enabled = true node_pool = "vault-backup" node_class = "vault-backup" server_join { retry_join = ["${NOMAD_SERVERS_ADDR}"] } host_volume "vault_vol" { path = "/etc/vault.d" read_only = false } } plugin "docker" { config { allow_privileged = true } } autopilot { cleanup_dead_servers = true last_contact_threshold = "200ms" max_trailing_logs = 250 server_stabilization_time = "10s" enable_redundancy_zones = false disable_upgrade_migration = false enable_custom_upgrades = false } EOF cat < /etc/nomad.d/acl.hcl acl = { enabled = true } EOF systemctl restart nomadNext, you need to actually deploy the VM using the code below, which is added to the compute.tf file: resource "aws_instance" "nomad_clients_vault_backup" { count = 1 ami = data.aws_ami.ubuntu.id instance_type = "t3.medium" subnet_id = module.vpc.public_subnets.0 key_name = aws_key_pair.deployer.key_name associate_public_ip_address = true user_data = templatefile("./vault-backup-server.sh", { NOMAD_SERVERS_ADDR = "${aws_instance.nomad_servers.private_ip}" }) vpc_security_group_ids = [ aws_security_group.ssh.id, aws_security_group.egress.id, aws_security_group.nomad.id, aws_security_group.vault.id ] tags = { Name = "Vault backup server" nomad_server = false } lifecycle { ignore_changes = [ user_data, ami ] } depends_on = [ terracurl_request.nomad_status ] }Nomad ACL configuration As part of the Nomad server configuration deployed by the cloud-init script, ACLs were enabled. This means that an access token is required before any actions can be performed. As this is a new install of Nomad, a token does not exist yet. To bootstrap Nomad’s ACL system with the initial management token, you can use Nomad’s API and TerraCurl. TerraCurl is useful in this scenario because the Nomad Terraform provider does not support this bootstrapping functionality. You can write the TerraCurl resource to bootstrap the Nomad ACL system. There is a file in the 1-nomad-infrastructure directory called nomad.tf that includes the following code: resource "terracurl_request" "bootstrap_acl" { method = "POST" name = "bootstrap" response_codes = [200, 201] url = "http://${aws_instance.nomad_servers.public_ip}:4646/v1/acl/bootstrap" }This code makes a POST API call to the Nomad server using the public IP address that was assigned to the VM during creation. It uses Terraform’s interpolation and joins that to the Nomad API endpoint /v1/acl/bootstrap to make the call. This resource tells Terraform to expect either a 200 or 201response code from the API call or Terraform will fail. The response body from the API call is stored in state. Terraform outputs In order to provide some of the computed values to other workspaces, you need to output them. To do this, create a file called outputs.tf in the same directory as above and insert the following code: output "nomad_server_public_ip" { value = aws_eip.nomad_server.public_ip } output "nomad_server_private_ip" { value = aws_instance.nomad_servers.private_ip } output "nomad_clients_private_ips" { value = aws_instance.nomad_clients.*.private_ip } output "nomad_clients_public_ips" { value = aws_instance.nomad_clients.*.public_ip } output "terraform_management_token" { value = nomad_acl_token.terraform.secret_id sensitive = true } output "nomad_ui" { value = "http://${aws_eip.nomad_server.public_ip}:4646" }The directory structure should now look like this: ├── 1-nomad-infrastrcuture │ ├── anonymous-policy.hcl │ ├── backend.tf │ ├── client-policy.hcl │ ├── clients.sh │ ├── compute.tf │ ├── network.tf │ ├── nomad-client-vault-backup.sh │ ├── nomad.tf │ ├── outputs.tf │ ├── providers.tf │ ├── servers.sh │ └── variables.tf ├── 2-nomad-configuration │ ├── backend.tf └── 3-nomad-job-example-deployment └── backend.tf 3 directories, 14 files Now you can run terraform plan and terraform apply to create these resources. In the next blog This blog post showed how to deploy the infrastructure required to run Vault on Nomad. It covered some Terraform directory structure concepts and how they relate to workspaces. It also covered deploying and configuring Nomad, as well as Nomad ACLs. Part 2 of this blog series will look at deploying Vault as a Nomad job and configuring it, while Part 3 will explore deploying some automation to assist in the day-to-day operations of Vault. View the full article -
One of the core goals of the HashiCorp Nomad team is to enable users to run workloads the way they like. That goal extends to the Red Hat Podman container engine for running OCI Linux containers. Through community-driven efforts, running Podman tasks in Nomad has been possible with the Podman task driver plugin for some time. Thanks to recent community efforts, we’ve made progress maturing the Podman and Nomad integration for high-value applications and workflows. This post shares some of the new features of the Nomad-Podman integration. The Nomad-Podman integration has matured The latest enhancements to the Nomad-Podman integration add support for: Running Podman containers in task groups with bridge networking enabled Authentication options in the task driver config Specifying a credentials helper or external credentials configuration file for working with private registries (via Podman driver support) Although there is still work to be done, the progress made opens doors to new Podman use cases that were tedious, if not impossible, before. Podman and Consul service mesh Nomad 1.7, released in GA this month, takes advantage of these developments to provide a more seamless experience when using Podman in conjunction with our HashiCorp Consul service mesh integration. Consul service mesh requires the use of an Envoy sidecar proxy for each task participating in the network mesh. Since its inception, the Nomad integration for Consul service mesh has defaulted to a Docker-specific task configuration for these Envoy sidecars. Now, Nomad will automatically detect if Podman is the most suitable task driver and generate an Envoy task configuration that runs Envoy as a Podman container. Job submitters no longer need to specify a custom connect.sidecar_task block. Why use Podman instead of Docker? There are substantial advantages to using Podman as a task driver instead of Docker: Podman does not require running dockerd and containerd as root processes on every node. Podman makes clever use of systemd socket activation to launch containers. Most mainstream Linux distributions are based on systemd, providing widespread out-of-the-box support. Unlike Docker, Podman does not require pause containers to support the bridge networking use case Earlier in this post, we discussed how Envoy containers run alongside each Nomad task when using the Consul service mesh integration. With Docker, each task group with one of those Envoy containers actually requires a second sidecar — the pause container — which is necessary due to the special way Docker creates network namespaces. Since Podman works with the conventional network namespaces created and managed by the Nomad client, there is no need for that extra sidecar. In a cluster with thousands of tasks participating in a service mesh, the savings on CPU, memory, disk, and bandwidth can be significant. More maturity on the way These improvements to the Nomad-Podman integration story are only the beginning. Going forward, we intend to provide first-class support for rootless Podman, address outstanding bug reports, enhance our Podman integration test coverage, and update the plugin-packaging. To find out more about the Nomad-Podman integration, please visit our Podman task driver page. View the full article
-
Forum Statistics
67.4k
Total Topics65.3k
Total Posts