Search the Community
Showing results for tags 'dockerfiles'.
-
Dockerfiles are fundamental tools for developers working with Docker, serving as a blueprint for creating Docker images. These text documents contain all the commands a user could call on the command line to assemble an image. Understanding and effectively utilizing Dockerfiles can significantly streamline the development process, allowing for the automation of image creation and ensuring consistent environments across different stages of development. Dockerfiles are pivotal in defining project environments, dependencies, and the configuration of applications within Docker containers. With new versions of the BuildKit builder toolkit, Docker Buildx CLI, and Dockerfile frontend for BuildKit (v1.7.0), developers now have access to enhanced Dockerfile capabilities. This blog post delves into these new Dockerfile capabilities and explains how you can can leverage them in your projects to further optimize your Docker workflows. Versioning Before we get started, here’s a quick reminder of how Dockerfile is versioned and what you should do to update it. Although most projects use Dockerfiles to build images, BuildKit is not limited only to that format. BuildKit supports multiple different frontends for defining the build steps for BuildKit to process. Anyone can create these frontends, package them as regular container images, and load them from a registry when you invoke the build. With the new release, we have published two such images to Docker Hub: docker/dockerfile:1.7.0 and docker/dockerfile:1.7.0-labs. To use these frontends, you need to specify a #syntax directive at the beginning of the file to tell BuildKit which frontend image to use for the build. Here we have set it to use the latest of the 1.x.x major version. For example: #syntax=docker/dockerfile:1 FROM alpine ... This means that BuildKit is decoupled from the Dockerfile frontend syntax. You can start using new Dockerfile features right away without worrying about which BuildKit version you’re using. All the examples described in this article will work with any version of Docker that supports BuildKit (the default builder as of Docker 23), as long as you define the correct #syntax directive on the top of your Dockerfile. You can learn more about Dockerfile frontend versions in the documentation. Variable expansions When you write Dockerfiles, build steps can contain variables that are defined using the build arguments (ARG) and environment variables (ENV) instructions. The difference between build arguments and environment variables is that environment variables are kept in the resulting image and persist when a container is created from it. When you use such variables, you most likely use ${NAME} or, more simply, $NAME in COPY, RUN, and other commands. You might not know that Dockerfile supports two forms of Bash-like variable expansion: ${variable:-word}: Sets a value to word if the variable is unset ${variable:+word}: Sets a value to word if the variable is set Up to this point, these special forms were not that useful in Dockerfiles because the default value of ARG instructions can be set directly: FROM alpine ARG foo="default value" If you are an expert in various shell applications, you know that Bash and other tools usually have many additional forms of variable expansion to ease the development of your scripts. In Dockerfile v1.7, we have added: ${variable#pattern} and ${variable##pattern} to remove the shortest or longest prefix from the variable’s value. ${variable%pattern} and ${variable%%pattern} to remove the shortest or longest prefix from the variable’s value. ${variable/pattern/replacement} to first replace occurrence of a pattern ${variable//pattern/replacement} to replace all occurrences of a pattern How these rules are used might not be completely obvious at first. So, let’s look at a few examples seen in actual Dockerfiles. For example, projects often can’t agree on whether versions for downloading your dependencies should have a “v” prefix or not. The following allows you to get the format you need: # example VERSION=v1.2.3 ARG VERSION=${VERSION#v} # VERSION is now '1.2.3' In the next example, multiple variants are used by the same project: ARG VERSION=v1.7.13 ADD https://github.com/containerd/containerd/releases/download/${VERSION}/containerd-${VERSION#v}-linux-amd64.tar.gz / To configure different command behaviors for multi-platform builds, BuildKit provides useful built-in variables like TARGETOS and TARGETARCH. Unfortunately, not all projects use the same values. For example, in containers and the Go ecosystem, we refer to 64-bit ARM architecture as arm64, but sometimes you need aarch64 instead. ADD https://github.com/oven-sh/bun/releases/download/bun-v1.0.30/bun-linux-${TARGETARCH/arm64/aarch64}.zip / In this case, the URL also uses a custom name for AMD64 architecture. To pass a variable through multiple expansions, use another ARG definition with an expansion from the previous value. You could also write all the definitions on a single line, as ARG allows multiple parameters, which may hurt readability. ARG ARCH=${TARGETARCH/arm64/aarch64} ARG ARCH=${ARCH/amd64/x64} ADD https://github.com/oven-sh/bun/releases/download/bun-v1.0.30/bun-linux-${ARCH}.zip / Note that the example above is written in a way that if a user passes their own --build-arg ARCH=value, then that value is used as-is. Now, let’s look at how new expansions can be useful in multi-stage builds. One of the techniques described in “Advanced multi-stage build patterns” shows how build arguments can be used so that different Dockerfile commands run depending on the build-arg value. For example, you can use that pattern if you build a multi-platform image and want to run additional COPY or RUN commands only for specific platforms. If this method is new to you, you can learn more about it from that post. In summarized form, the idea is to define a global build argument and then define build stages that use the build argument value in the stage name while pointing to the base of your target stage via the build-arg name. Old example: ARG BUILD_VERSION=1 FROM alpine AS base RUN … FROM base AS branch-version-1 RUN touch version1 FROM base AS branch-version-2 RUN touch version2 FROM branch-version-${BUILD_VERSION} AS after-condition FROM after-condition RUN … When using this pattern for multi-platform builds, one of the limitations is that all the possible values for the build-arg need to be defined by your Dockerfile. This is problematic as we want Dockerfile to be built in a way that it can build on any platform and not limit it to a specific set. You can see other examples here and here of Dockerfiles where dummy stage aliases must be defined for all architectures, and no other architecture can be built. Instead, the pattern we would like to use is that there is one architecture that has a special behavior, and everything else shares another common behavior. With new expansions, we can write this to demonstrate running special commands only on RISC-V, which is still somewhat new and may need custom behavior: #syntax=docker/dockerfile:1.7 ARG ARCH=${TARGETARCH#riscv64} ARG ARCH=${ARCH:+"common"} ARG ARCH=${ARCH:-$TARGETARCH} FROM --platform=$BUILDPLATFORM alpine AS base-common ARG TARGETARCH RUN echo "Common build, I am $TARGETARCH" > /out FROM --platform=$BUILDPLATFORM alpine AS base-riscv64 ARG TARGETARCH RUN echo "Riscv only special build, I am $TARGETARCH" > /out FROM base-${ARCH} AS base Let’s look at these ARCH definitions more closely. The first sets ARCH to TARGETARCH but removes riscv64 from the value. Next, as we described previously, we don’t actually want the other architectures to use their own values but instead want them all to share a common value. So, we set ARCH to common except if it was cleared from the previous riscv64 rule. Now, if we still have an empty value, we default it back to $TARGETARCH. The last definition is optional, as we would already have a unique value for both cases, but it makes the final stage name base-riscv64 nicer to read. Additional examples of including multiple conditions with shared conditions, or conditions based on architecture variants can be found in this GitHub Gist page. Comparing this example to the initial example of conditions between stages, the new pattern isn’t limited to just controlling the platform differences of your builds but can be used with any build-arg. If you have used this pattern before, then you can effectively now define an “else” clause, whereas previously, you were limited to only “if” clauses. Copy with keeping parent directories The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature. #syntax=docker/dockerfile:1.7-labs When you are copying files in your Dockerfile, for example, do this: COPY app/file /to/dest/dir/ This example means the source file is copied directly to the destination directory. If your source path was a directory, all the files inside that directory would be copied directly to the destination path. What if you have a file structure like the following: . ├── app1 │ ├── docs │ │ └── manual.md │ └── src │ └── server.go └── app2 └── src └── client.go You want to copy only files in app1/src, but so that the final files at the destination would be /to/dest/dir/app1/src/server.go and not just /to/dest/dir/server.go. With the new COPY --parents flag, you can write: COPY --parents /app1/src/ /to/dest/dir/ This will copy the files inside the src directory and recreate the app1/src directory structure for these files. Things get more powerful when you start to use wildcard paths. To copy the src directories for both apps into their respective locations, you can write: COPY --parents */src/ /to/dest/dir/ This will create both /to/dest/dir/app1 and /to/dest/dir/app2, but it will not copy the docs directory. Previously, this kind of copy was not possible with a single command. You would have needed multiple copies for individual files (as shown in this example) or used some workaround with the RUN --mount instruction instead. You can also use double-star wildcard (**) to match files under any directory structure. For example, to copy only the Go source code files anywhere in your build context, you can write: COPY --parents **/*.go /to/dest/dir/ If you are thinking about why you would need to copy specific files instead of just using COPY ./ to copy all files, remember that your build cache gets invalidated when you include new files in your build. If you copy all files, the cache gets invalidated when any file is added or changed, whereas if you copy only Go files, only changes in these files influence the cache. The new --parents flag is not only for COPY instructions from your build context, but obviously, you can also use them in multi-stage builds when copying files between stages using COPY --from. Note that with COPY --from syntax, all source paths are expected to be absolute, meaning that if the --parents flag is used with such paths, they will be fully replicated as they were in the source stage. That may not always be desirable, and instead, you may want to keep some parents but discard and replace others. In that case, you can use a special /./ relative pivot point in your source path to mark which parents you wish to copy and which should be ignored. This special path component resembles how rsync works with the --relative flag. #syntax=docker/dockerfile:1.7-labs FROM ... AS base RUN ./generate-lot-of-files -o /out/ # /out/usr/bin/foo # /out/usr/lib/bar.so # /out/usr/local/bin/baz FROM scratch COPY --from=base --parents /out/./**/bin/ / # /usr/bin/foo # /usr/local/bin/baz This example above shows how only bin directories are copied from the collection of files that the intermediate stage generated, but all the directories will keep their paths relative to the out directory. Exclusion filters The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature: #syntax=docker/dockerfile:1.7-labs Another related case when moving files in your Dockerfile with COPY and ADD instructions is when you want to move a group of files but exclude a specific subset. Previously, your only options were to use RUN --mount or try to define your excluded files inside a .dockerignore file. .dockerignore files, however, are not a good solution for this problem, because they only list the files excluded from the client-side build context and not from builds from remote Git/HTTP URLs and are limited to one per Dockerfile. You should use them similarly to .gitignore to mark files that are never part of your project but not as a way to define your application-specific build logic. With the new --exclude=[pattern] flag, you can now define such exclusion filters for your COPY and ADD commands directly in the Dockerfile. The pattern uses the same format as .dockerignore. The following example copies all the files in a directory except Markdown files: COPY --exclude=*.md app /dest/ You can use the flag multiple times to add multiple filters. The next example excludes Markdown files and also a file called README: COPY --exclude=*.md --exclude=README app /dest/ Double-star wildcards exclude not only Markdown files in the copied directory but also in any subdirectory: COPY --exclude=**/*.md app /dest/ As in .dockerignore files, you can also define exceptions to the exclusions with ! prefix. The following example excludes all Markdown files in any copied directory, except if the file is called important.md — in that case, it is still copied. COPY --exclude=**/*.md --exclude=!**/important.md app /dest/ This double negative may be confusing initially, but note that this is a reversal of the previous exclude rule, and “include patterns” are defined by the source parameter of the COPY instruction. When using --exclude together with previously described --parents copy mode, note that the exclude patterns are relative to the copied parent directories or to the pivot point /./ if one is defined. See the following directory structure for example: assets ├── app1 │ ├── icons32x32 │ ├── icons64x64 │ ├── notes │ └── backup ├── app2 │ └── icons32x32 └── testapp └── icons32x32 COPY --parents --exclude=testapp assets/./**/icons* /dest/ This command would create the directory structure below. Note that only directories with the icons prefix were copied, the root parent directory assets was skipped as it was before the relative pivot point, and additionally, testapp was not copied as it was defined with an exclusion filter. dest ├── app1 │ ├── icons32x32 │ └── icons64x64 └── app2 └── icons32x32 Conclusion We hope this post gave you ideas for improving your Dockerfiles and that the patterns shown here will help you describe your build more efficiently. Remember that your Dockerfile can start using all these features today by defining the #syntax line on top, even if you haven’t updated to the latest Docker yet. For a full list of other features in the new BuildKit, Buildx, and Dockerfile releases, check out the changelogs: BuildKit v0.13.0 Buildx v0.13.0 Dockerfile v1.7.0 v1.7.0-labs Thanks to community members @tstenner, @DYefimov, and @leandrosansilva for helping to implement these features! If you have issues or suggestions you want to share, let us know in the issue tracker. Learn more Subscribe to the Docker Newsletter. Get the latest release of Docker Desktop. Vote on what’s next! Check out our public roadmap. Have questions? The Docker community is here to help. New to Docker? Get started. View the full article
-
At the heart of Docker's containerization process is the Dockerfile, a file that helps automate the creation of Docker images. In this blog post, we’ll take a detailed look at what a Dockerfile is and how it works. Let's get started! What is a Dockerfile?A Dockerfile is a text file that contains instructions on how to build a Docker image. Each instruction is composed of a command followed by one or more arguments. By convention, commands are written in uppercase to distinguish them from arguments and make the Dockerfile more readable. Here is an example Dockerfile for a Node.js application: FROM node:20.11.1 WORKDIR /app COPY package.json /app RUN npm install COPY . /app CMD ["node", "server.js"] Here are the sequential tasks that are executed when building a Docker image from this Dockerfile: Docker starts by looking for the base image specified in the FROM instruction (node:20.11.1) in the local cache. If it's not found locally, Docker fetches it from Docker Hub.Next, Docker creates a working directory inside the container's filesystem as specified by the WORKDIR instruction (/app).The COPY instruction copies package.json into the /app directory in the container. This is crucial for managing project dependencies.Docker then executes the RUN npm install command to install the dependencies defined in package.json.After the installation of dependencies, Docker copies the remaining project files into the /app directory with another COPY instruction. Finally, the CMD instruction sets the default command to run inside the container (node server.js), which starts the application.Want to learn more about building a Docker image using a Dockerfile? Check out this blog post: How to Build a Docker Image With Dockerfile From Scratch. Common Dockerfile InstructionsBelow, we discuss some of the most important commands commonly used in a Dockerfile: FROM: Specifies the base image for subsequent instructions. Every Dockerfile must start with a FROM command. ADD / COPY: Both commands enable the transfer of files from the host to the container’s filesystem. The ADD instruction is particularly useful when adding files from remote URLs or for the automatic extraction of compressed files from the local filesystem directly into the container's filesystem. Note that Docker recommends using COPY over ADD, especially when transferring local files. WORKDIR: Sets the working directory for any RUN, CMD, ENTRYPOINT, COPY, and ADD instructions that follow it in the Dockerfile. If the specified directory does not exist, it’s created automatically.RUN: Executes commands specified during the build step of the container. It can be used to install necessary packages, update existing packages, and create users and groups, among other system configuration tasks within the container.CMD / ENTRYPOINT: Both provide default commands to be executed when a Docker image is run as a container. The main distinction is that the argument passed to the ENTRYPOINT command cannot be overridden, while the argument passed to the CMD command can.For a comprehensive guide to all available Dockerfile instructions, refer to the official Docker documentation at Dockerfile reference. Relationship Between Dockerfile Instructions and Docker Image LayersEach instruction in a Dockerfile creates a new layer in the Docker image. These layers are stacked on top of each other, and each layer represents the change made from the layer below it. The most important point to note here is that Docker caches these layers to speed up subsequent builds (more on this in the next section). As a general rule, any Dockerfile command that modifies the file system (such as FROM, RUN, and COPY) creates a new layer. Commands instructing how to build the image and run the container (such as WORKDIR, ENV, and ENTRYPOINT) add zero-byte-sized metadata layers to the created image. To view the commands that create the image layers and the sizes they contribute to the Docker image, you can run the following command: docker history <IMAGE_NAME> You can also run the following command to find out the number of image layers: docker inspect --format '{{json .RootFS.Layers}}' <IMAGE_NAME> In this command, we use a Go template to extract the layers’ information. For a deep dive into Docker image layers, check out our blog post: What Are Docker Image Layers and How Do They Work? Dockerfile and Build CacheWhen you build a Docker image using the Dockerfile, Docker checks each instruction (layer) against its build cache. If a layer has not changed (meaning the instruction and its context are identical to a previous build), Docker uses the cached layer instead of executing the instruction again. Let’s see this in action. Below is the output we get from building a sample Node app using the Dockerfile in the previous section: From the screenshot above, the build process took 1244.2 seconds. Building another Docker image (without making any changes to the application code or Dockerfile), the build time is drastically reduced to just 6.9 seconds, as shown below: The significant decrease in build time for the second build demonstrates Docker's effective use of the build cache. Since there were no alterations in the Dockerfile instructions or the application code, Docker used the cached layers from the first build. One more important point to note is that caching has a cascading effect. Once an instruction is modified, all subsequent instructions, even if unchanged, will be executed afresh because Docker can no longer guarantee their outcomes are the same as before. This characteristic of Docker's caching mechanism has significant implications for the organization of instructions within a Dockerfile. In the upcoming section on Dockerfile best practices, we'll learn how to strategically order Dockerfile instructions to optimize build times. Best Practices for Writing DockerfilesBelow, we discuss three recommended best practices you should follow when writing Dockerfiles: #1 Use a .dockerignore fileWhen writing Dockerfiles, ensure that only the files and folders required for your application are copied to the container’s filesystem. To help with this, create a .dockerignore file in the same directory as your Dockerfile. In this file, list all the files and directories that are unnecessary for building and running your application—similar to how you would use a .gitignore file to exclude files from a git repository. Not including irrelevant files in the Docker build context helps to keep the image size small. Smaller images bring significant advantages: they require less time and bandwidth to download, occupy less storage space on disk, and consume less memory when loaded into a Docker container. #2 Keep the number of image layers relatively smallAnother best practice to follow while writing Dockerfiles is to keep the number of image layers as low as possible, as this directly impacts the startup time of the container. But how can we effectively reduce the number of image layers? A simple method is to consolidate multiple RUN commands into a single command. Let’s say we have a Dockerfile that contains three separate commands like these: RUN apt-get update RUN apt-get install -y nginx RUN apt-get clean This will result in three separate layers. However, by merging these commands into one, as shown below, we can reduce the number of layers from three to one. RUN apt-get update && \ apt-get install -y nginx && \ apt-get clean In this version, we use the && operator along with the \ for line continuation. The && operator executes commands sequentially, ensuring that each command is run only if the previous one succeeds. This approach is critical for maintaining the build's integrity by stopping the build if any command fails, thus preventing the creation of a defective image. The \ aids in breaking up long commands into more readable segments. #3 Order Dockerfile instructions to leverage caching as much as possibleWe know that Docker uses the build cache to try to avoid rebuilding any image layers that it has already built and that do not contain any noticeable changes. Due to this caching strategy, the order in which you organize instructions within your Dockerfile is important in determining the average duration of your build processes. The best practice is to place instructions that are least likely to change towards the beginning and those that change more frequently towards the end of the Dockerfile. This strategy is grounded in how Docker rebuilds images: Docker checks each instruction in sequence against its cache. If it encounters a change in an instruction, it cannot use the cache for this and all subsequent instructions. Instead, Docker rebuilds each layer from the point of change onwards. Consider the Dockerfile below: FROM node:20.11.1 WORKDIR /app COPY . /app RUN npm install CMD ["node", "server.js"]It works fine, but there is an issue. On line 3, we copy the entire directory (including the application code) into the container. Following this, on line 4, we install the dependencies. This setup has a significant drawback: any modifications to the application code lead to the invalidation of the cache starting from this point. As a result, dependencies are reinstalled with each build. This process is not only time-consuming but also unnecessary, considering that dependency updates occur less frequently than changes to the application code. To better leverage Docker's cache, we can adjust our approach by initially copying only the package.json file to install dependencies, followed by copying the rest of the application code: FROM node:20.11.1 WORKDIR /app COPY package.json /app RUN npm install COPY . /app CMD ["node", "server.js"] This modification means that changes to the application code now only affect the cache from line 5 onwards. The installation of dependencies, happening before this, benefits from cache retention (unless there are changes to package.json), thus optimizing the build time. ConclusionIn this blog post, we began by defining what a Dockerfile is, followed by a discussion of the most frequently used commands within a Dockerfile. We then explored the relationship between Dockerfile instructions and Docker image layers, as well as the concept of the build cache and how Docker employs it to improve build times. Lastly, we outlined three recommended best practices for writing Dockerfiles. With these insights, you now have the knowledge required to write efficient Dockerfiles. Interested in learning more about Docker? Check out the following courses from KodeKloud: Docker for the Absolute BeginnerDocker Certified Associate Exam CourseView the full article
-
Today we’re featuring a blog from Adam Gordon Bell at Earthly who writes about how BuildKit, a technology developed by Docker and the community, works and how to write a simple frontend. Earthly uses BuildKit in their product. Introduction How are containers made? Usually, from a series of statements like `RUN`, `FROM`, and `COPY`, which are put into a Dockerfile and built. But how are those commands turned into a container image and then a running container? We can build up an intuition for how this works by understanding the phases involved and creating a container image ourselves. We will create an image programmatically and then develop a trivial syntactic frontend and use it to build an image. On `docker build` We can create container images in several ways. We can use Buildpacks, we can use build tools like Bazel or sbt, but by far, the most common way images are built is using `docker build` with a Dockerfile. The familiar base images Alpine, Ubuntu, and Debian are all created this way. Here is an example Dockerfile: ```dockerfile FROM alpine COPY README.md README.md RUN echo "standard docker build" > /built.txt" We will be using variations on this Dockerfile throughout this tutorial. We can build it like this: ```console docker build . -t test ``` But what is happening when you call `docker build`? To understand that, we will need a little background. Background A docker image is made up of layers. Those layers form an immutable filesystem. A container image also has some descriptive data, such as the start-up command, the ports to expose, and volumes to mount. When you `docker run` an image, it starts up inside a container runtime. I like to think about images and containers by analogy. If an image is like an executable, then a container is like a process. You can run multiple containers from one image, and a running image isn’t an image at all but a container. Continuing our analogy, BuildKit is a compiler, just like LLVM. But whereas a compiler takes source code and libraries and produces an executable, BuildKit takes a Dockerfile and a file path and creates a container image. Docker build uses BuildKit, to turn a Dockerfile into a docker image, OCI image, or another image format. In this walk-through, we will primarily use BuildKit directly. This primer on using BuildKit supplies some helpful background on using BuildKit, `buildkitd`, and `buildctl` via the command-line. However, the only prerequisite for today is running `brew install buildkit` or the appropriate OS equivalent steps. How Do Compilers Work? A traditional compiler takes code in a high-level language and lowers it to a lower-level language. In most conventional ahead-of-time compilers, the final target is machine code. Machine code is a low-level programming language that your CPU understands. Fun Fact: Machine Code VS. Assembly Machine code is written in binary. This makes it hard for a human to understand. Assembly code is a plain-text representation of machine code that is designed to be somewhat human-readable. There is generally a 1-1 mapping between instructions the machine understands (in machine code) and the OpCodes in Assembly Compiling the classic C “Hello, World” into x86 assembly code using the Clang frontend for LLVM looks like this: Creating an image from a dockerfile works a similar way: BuildKit is passed the Dockerfile and the build context, which is the present working directory in the above diagram. In simplified terms, each line in the dockerfile is turned into a layer in the resulting image. One significant way image building differs from compiling is this build context. A compiler’s input is limited to source code, whereas `docker build` takes a reference to the host filesystem as an input and uses it to perform actions such as `COPY`. There Is a Catch The earlier diagram of compiling “Hello, World” in a single step missed a vital detail. Computer hardware is not a singular thing. If every compiler were a hand-coded mapping from a high-level language to x86 machine code, then moving to the Apple M1 processor would be quite challenging because it has a different instruction set. Compiler authors have overcome this challenge by splitting compilation into phases. The traditional phases are the frontend, the backend, and the middle. The middle phase is sometimes called the optimizer, and it deals primarily with an internal representation (IR). This staged approach means you don’t need a new compiler for each new machine architecture. Instead, you just need a new backend. Here is an example of what that looks like in LLVM: Intermediate Representations This multiple backend approach allows LLVM to target ARM, X86, and many other machine architectures using LLVM Intermediate Representation (IR) as a standard protocol. LLVM IR is a human-readable programming language that backends need to be able to take as input. To create a new backend, you need to write a translator from LLVM IR to your target machine code. That translation is the primary job of each backend. Once you have this IR, you have a protocol that various phases of the compiler can use as an interface, and you can build not just many backends but many frontends as well. LLVM has frontends for numerous languages, including C++, Julia, Objective-C, Rust, and Swift. If you can write a translation from your language to LLVM IR, LLVM can translate that IR into machine code for all the backends it supports. This translation function is the primary job of a compiler frontend. In practice, there is much more to it than that. Frontends need to tokenize and parse input files, and they need to return pleasant errors. Backends often have target-specific optimizations to perform and heuristics to apply. But for this tutorial, the critical point is that having a standard representation ends up being a bridge that connects many front ends with many backends. This shared interface removes the need to create a compiler for every combination of language and machine architecture. It is a simple but very empowering trick! BuildKit Images, unlike executables, have their own isolated filesystem. Nevertheless, the task of building an image looks very similar to compiling an executable. They can have varying syntax (dockerfile1.0, dockerfile1.2), and the result must target several machine architectures (arm64 vs. x86_64). “LLB is to Dockerfile what LLVM IR is to C” – BuildKit Readme This similarity was not lost on the BuildKit creators. BuildKit has its own intermediate representation, LLB. And where LLVM IR has things like function calls and garbage-collection strategies, LLB has mounting filesystems and executing statements. LLB is defined as a protocol buffer, and this means that BuildKit frontends can make GRPC requests against buildkitd to build a container directly. Programmatically Making An Image Alright, enough background. Let’s programmatically generate the LLB for an image and then build an image. Using Go In this example, we will be using Go which lets us leverage existing BuildKit libraries, but it’s possible to accomplish this in any language with Protocol Buffer support. Import LLB definitions: ```go import ( "github.com/moby/buildkit/client/llb" ) ``` Create LLB for an Alpine image: go func createLLBState() llb.State { return llb.Image("docker.io/library/alpine"). File(llb.Copy(llb.Local("context"), "README.md", "README.md")). Run(llb.Args([]string{"/bin/sh", "-c", "echo \"programmatically built\" > /built.txt"})). Root() } ``` We are accomplishing the equivalent of a `FROM` by using `llb.Image`. Then, we copy a file from the local file system into the image using `File` and `Copy`. Finally, we `RUN` a command to echo some text to a file. LLB has many more operations, but you can recreate many standard images with these three building blocks. The final thing we need to do is turn this into protocol-buffer and emit it to standard out: ```go func main() { dt, err := createLLBState().Marshal(context.TODO(), llb.LinuxAmd64) if err != nil { panic(err) } llb.WriteTo(dt, os.Stdout) } ``` Let’s look at the what this generates using the `dump-llb` option of buildctl: ```console go run ./writellb/writellb.go | buildctl debug dump-llb | jq . We get this JSON formatted LLB: ```json { "Op": { "Op": { "source": { "identifier": "local://context", "attrs": { "local.unique": "s43w96rwjsm9tf1zlxvn6nezg" } } }, "constraints": {} }, "Digest": "sha256:c3ca71edeaa161bafed7f3dbdeeab9a5ab34587f569fd71c0a89b4d1e40d77f6", "OpMetadata": { "caps": { "source.local": true, "source.local.unique": true } } } { "Op": { "Op": { "source": { "identifier": "docker-image://docker.io/library/alpine:latest" } }, "platform": { "Architecture": "amd64", "OS": "linux" }, "constraints": {} }, "Digest": "sha256:665ba8b2cdc0cb0200e2a42a6b3c0f8f684089f4cd1b81494fbb9805879120f7", "OpMetadata": { "caps": { "source.image": true } } } { "Op": { "inputs": [ { "digest": "sha256:665ba8b2cdc0cb0200e2a42a6b3c0f8f684089f4cd1b81494fbb9805879120f7", "index": 0 }, { "digest": "sha256:c3ca71edeaa161bafed7f3dbdeeab9a5ab34587f569fd71c0a89b4d1e40d77f6", "index": 0 } ], "Op": { "file": { "actions": [ { "input": 0, "secondaryInput": 1, "output": 0, "Action": { "copy": { "src": "/README.md", "dest": "/README.md", "mode": -1, "timestamp": -1 } } } ] } }, "platform": { "Architecture": "amd64", "OS": "linux" }, "constraints": {} }, "Digest": "sha256:ba425dda86f06cf10ee66d85beda9d500adcce2336b047e072c1f0d403334cf6", "OpMetadata": { "caps": { "file.base": true } } } { "Op": { "inputs": [ { "digest": "sha256:ba425dda86f06cf10ee66d85beda9d500adcce2336b047e072c1f0d403334cf6", "index": 0 } ], "Op": { "exec": { "meta": { "args": [ "/bin/sh", "-c", "echo "programmatically built" > /built.txt" ], "cwd": "/" }, "mounts": [ { "input": 0, "dest": "/", "output": 0 } ] } }, "platform": { "Architecture": "amd64", "OS": "linux" }, "constraints": {} }, "Digest": "sha256:d2d18486652288fdb3516460bd6d1c2a90103d93d507a9b63ddd4a846a0fca2b", "OpMetadata": { "caps": { "exec.meta.base": true, "exec.mount.bind": true } } } { "Op": { "inputs": [ { "digest": "sha256:d2d18486652288fdb3516460bd6d1c2a90103d93d507a9b63ddd4a846a0fca2b", "index": 0 } ], "Op": null }, "Digest": "sha256:fda9d405d3c557e2bd79413628a435da0000e75b9305e52789dd71001a91c704", "OpMetadata": { "caps": { "constraints": true, "platform": true } } } ``` Looking through the output, we can see how our code maps to LLB. Here is our `Copy` as part of a FileOp: ```json "Action": { "copy": { "src": "/README.md", "dest": "/README.md", "mode": -1, "timestamp": -1 } ``` Here is mapping our build context for use in our `COPY` command: ```json "Op": { "source": { "identifier": "local://context", "attrs": { "local.unique": "s43w96rwjsm9tf1zlxvn6nezg" } } ``` Similarly, the output contains LLB that corresponds to our `RUN` and `FROM` commands. Building Our LLB To build our image, we must first start `buildkitd`: ```console docker run --rm --privileged -d --name buildkit moby/buildkit export BUILDKIT_HOST=docker-container://buildkit ``` To build our image, we must first start `buildkitd`: ```console go run ./writellb/writellb.go | buildctl build --local context=. --output type=image,name=docker.io/agbell/test,push=true ``` The output flag lets us specify what backend we want BuildKit to use. We will ask it to build an OCI image and push it to docker.io. Real-World Usage In the real-world tool, we might want to programmatically make sure `buildkitd` is running and send the RPC request directly to it, as well as provide friendly error messages. For tutorial purposes, we will skip all that. We can run it like this: ```console > docker run -it --pull always agbell/test:latest /bin/sh ``` And we can then see the results of our programmatic `COPY` and `RUN` commands: ```console / # cat built.txt programmatically built / # ls README.md README.md ``` There we go! The full code example can be a great starting place for your own programmatic docker image building. A True Frontend for BuildKit A true compiler front end does more than just emit hardcoded IR. A proper frontend takes in files, tokenizes them, parses them, generates a syntax tree, and then lowers that syntax tree into the internal representation. Mockerfiles are an example of such a frontend: yaml #syntax=r2d4/mocker apiVersion: v1alpha1 images:- name: demo from: ubuntu:16.04 package: install: - curl - git - gcc And because Docker build supports the `#syntax` command we can even build a Mockerfiles directly with `docker build`. ```console docker build -f mockerfile.yaml ``` To support the #syntax command, all that is needed is to put the frontend in a docker image that accepts a gRPC request in the correct format, publish that image somewhere. At that point, anyone can use your frontend `docker build` by just using `#syntax=yourimagename`. Building Our Own Example Frontend for `docker build` Building a tokenizer and a parser as a gRPC service is beyond the scope of this article. But we can get our feet wet by extracting and modifying an existing frontend. The standard dockerfile frontend is easy to disentangle from the moby project. I’ve pulled the relevant parts out into a stand-alone repo. Let’s make some trivial modifications to it and test it out. So far, we’ve only used the docker commands `FROM`, `RUN` and `COPY`. At a surface level, with its capitalized commands, Dockerfile syntax looks a lot like the programming language INTERCAL. Let change these commands to their INTERCAL equivalent and develop our own Ickfile format. Dockerfile Ickfile FROM COME FROM RUN PLEASE COPY STASH The modules in the dockerfile frontend split the parsing of the input file into several discrete steps, with execution flowing this way: For this tutorial, we are only going to make trivial changes to the frontend. We will leave all the stages intact and focus on customizing the existing commands to our tastes. To do this, all we need to do is change `command.go`: ```go package command // Define constants for the command strings const ( Copy = "stash" Run = "please" From = "come_from" ... ) ``` And we can then see results of our `STASH` and `PLEASE` commands: ```console / # cat built.txt custom frontend built / # ls README.md README.md ``` I’ve pushed this image to dockerhub. Anyone can start building images using our `ickfile` format by adding `#syntax=agbell/ick` to an existing Dockerfile. No manual installation is required! Enabling BuildKit BuildKit is enabled by default on Docker Desktop. It is not enabled by default in the current version of Docker for Linux (`version 20.10.5`). To instruct `docker build` to use BuildKit set the following environment variable `DOCKER_BUILDKIT=1` or change the Engine config. Conclusion We have learned that a three-phased structure borrowed from compilers powers building images, that an intermediate representation called LLB is the key to that structure. Empowered by the knowledge, we have produced two frontends for building images. This deep dive on frontends still leaves much to explore. If you want to learn more, I suggest looking into BuildKit workers. Workers do the actual building and are the secret behind `docker buildx`, and multi-archtecture builds. `docker build` also has support for remote workers and cache mounts, both of which can lead to faster builds. Earthly uses BuildKit internally for its repeatable build syntax. Without it, our containerized Makefile-like syntax would not be possible. If you want a saner CI process, then you should check it out. There is also much more to explore about how modern compilers work. Modern compilers often have many stages and more than one intermediate representation, and they are often able to do very sophisticated optimizations. The post Compiling Containers – Dockerfiles, LLVM and BuildKit appeared first on Docker Blog. View the full article
-
Forum Statistics
67.4k
Total Topics65.3k
Total Posts