Jump to content

Search the Community

Showing results for tags 'docker'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • General
    • General Discussion
    • Artificial Intelligence
    • DevOpsForum News
  • DevOps & SRE
    • DevOps & SRE General Discussion
    • Databases, Data Engineering & Data Science
    • Development & Programming
    • CI/CD, GitOps, Orchestration & Scheduling
    • Docker, Containers, Microservices, Serverless & Virtualization
    • Infrastructure-as-Code
    • Kubernetes & Container Orchestration
    • Linux
    • Logging, Monitoring & Observability
    • Security, Governance, Risk & Compliance
  • Cloud Providers
    • Amazon Web Services
    • Google Cloud Platform
    • Microsoft Azure

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Cloud Platforms


Cloud Experience


Development Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

  1. Dockerfiles are fundamental tools for developers working with Docker, serving as a blueprint for creating Docker images. These text documents contain all the commands a user could call on the command line to assemble an image. Understanding and effectively utilizing Dockerfiles can significantly streamline the development process, allowing for the automation of image creation and ensuring consistent environments across different stages of development. Dockerfiles are pivotal in defining project environments, dependencies, and the configuration of applications within Docker containers. With new versions of the BuildKit builder toolkit, Docker Buildx CLI, and Dockerfile frontend for BuildKit (v1.7.0), developers now have access to enhanced Dockerfile capabilities. This blog post delves into these new Dockerfile capabilities and explains how you can can leverage them in your projects to further optimize your Docker workflows. Versioning Before we get started, here’s a quick reminder of how Dockerfile is versioned and what you should do to update it. Although most projects use Dockerfiles to build images, BuildKit is not limited only to that format. BuildKit supports multiple different frontends for defining the build steps for BuildKit to process. Anyone can create these frontends, package them as regular container images, and load them from a registry when you invoke the build. With the new release, we have published two such images to Docker Hub: docker/dockerfile:1.7.0 and docker/dockerfile:1.7.0-labs. To use these frontends, you need to specify a #syntax directive at the beginning of the file to tell BuildKit which frontend image to use for the build. Here we have set it to use the latest of the 1.x.x major version. For example: #syntax=docker/dockerfile:1 FROM alpine ... This means that BuildKit is decoupled from the Dockerfile frontend syntax. You can start using new Dockerfile features right away without worrying about which BuildKit version you’re using. All the examples described in this article will work with any version of Docker that supports BuildKit (the default builder as of Docker 23), as long as you define the correct #syntax directive on the top of your Dockerfile. You can learn more about Dockerfile frontend versions in the documentation. Variable expansions When you write Dockerfiles, build steps can contain variables that are defined using the build arguments (ARG) and environment variables (ENV) instructions. The difference between build arguments and environment variables is that environment variables are kept in the resulting image and persist when a container is created from it. When you use such variables, you most likely use ${NAME} or, more simply, $NAME in COPY, RUN, and other commands. You might not know that Dockerfile supports two forms of Bash-like variable expansion: ${variable:-word}: Sets a value to word if the variable is unset ${variable:+word}: Sets a value to word if the variable is set Up to this point, these special forms were not that useful in Dockerfiles because the default value of ARG instructions can be set directly: FROM alpine ARG foo="default value" If you are an expert in various shell applications, you know that Bash and other tools usually have many additional forms of variable expansion to ease the development of your scripts. In Dockerfile v1.7, we have added: ${variable#pattern} and ${variable##pattern} to remove the shortest or longest prefix from the variable’s value. ${variable%pattern} and ${variable%%pattern} to remove the shortest or longest prefix from the variable’s value. ${variable/pattern/replacement} to first replace occurrence of a pattern ${variable//pattern/replacement} to replace all occurrences of a pattern How these rules are used might not be completely obvious at first. So, let’s look at a few examples seen in actual Dockerfiles. For example, projects often can’t agree on whether versions for downloading your dependencies should have a “v” prefix or not. The following allows you to get the format you need: # example VERSION=v1.2.3 ARG VERSION=${VERSION#v} # VERSION is now '1.2.3' In the next example, multiple variants are used by the same project: ARG VERSION=v1.7.13 ADD https://github.com/containerd/containerd/releases/download/${VERSION}/containerd-${VERSION#v}-linux-amd64.tar.gz / To configure different command behaviors for multi-platform builds, BuildKit provides useful built-in variables like TARGETOS and TARGETARCH. Unfortunately, not all projects use the same values. For example, in containers and the Go ecosystem, we refer to 64-bit ARM architecture as arm64, but sometimes you need aarch64 instead. ADD https://github.com/oven-sh/bun/releases/download/bun-v1.0.30/bun-linux-${TARGETARCH/arm64/aarch64}.zip / In this case, the URL also uses a custom name for AMD64 architecture. To pass a variable through multiple expansions, use another ARG definition with an expansion from the previous value. You could also write all the definitions on a single line, as ARG allows multiple parameters, which may hurt readability. ARG ARCH=${TARGETARCH/arm64/aarch64} ARG ARCH=${ARCH/amd64/x64} ADD https://github.com/oven-sh/bun/releases/download/bun-v1.0.30/bun-linux-${ARCH}.zip / Note that the example above is written in a way that if a user passes their own --build-arg ARCH=value, then that value is used as-is. Now, let’s look at how new expansions can be useful in multi-stage builds. One of the techniques described in “Advanced multi-stage build patterns” shows how build arguments can be used so that different Dockerfile commands run depending on the build-arg value. For example, you can use that pattern if you build a multi-platform image and want to run additional COPY or RUN commands only for specific platforms. If this method is new to you, you can learn more about it from that post. In summarized form, the idea is to define a global build argument and then define build stages that use the build argument value in the stage name while pointing to the base of your target stage via the build-arg name. Old example: ARG BUILD_VERSION=1 FROM alpine AS base RUN … FROM base AS branch-version-1 RUN touch version1 FROM base AS branch-version-2 RUN touch version2 FROM branch-version-${BUILD_VERSION} AS after-condition FROM after-condition RUN … When using this pattern for multi-platform builds, one of the limitations is that all the possible values for the build-arg need to be defined by your Dockerfile. This is problematic as we want Dockerfile to be built in a way that it can build on any platform and not limit it to a specific set. You can see other examples here and here of Dockerfiles where dummy stage aliases must be defined for all architectures, and no other architecture can be built. Instead, the pattern we would like to use is that there is one architecture that has a special behavior, and everything else shares another common behavior. With new expansions, we can write this to demonstrate running special commands only on RISC-V, which is still somewhat new and may need custom behavior: #syntax=docker/dockerfile:1.7 ARG ARCH=${TARGETARCH#riscv64} ARG ARCH=${ARCH:+"common"} ARG ARCH=${ARCH:-$TARGETARCH} FROM --platform=$BUILDPLATFORM alpine AS base-common ARG TARGETARCH RUN echo "Common build, I am $TARGETARCH" > /out FROM --platform=$BUILDPLATFORM alpine AS base-riscv64 ARG TARGETARCH RUN echo "Riscv only special build, I am $TARGETARCH" > /out FROM base-${ARCH} AS base Let’s look at these ARCH definitions more closely. The first sets ARCH to TARGETARCH but removes riscv64 from the value. Next, as we described previously, we don’t actually want the other architectures to use their own values but instead want them all to share a common value. So, we set ARCH to common except if it was cleared from the previous riscv64 rule. Now, if we still have an empty value, we default it back to $TARGETARCH. The last definition is optional, as we would already have a unique value for both cases, but it makes the final stage name base-riscv64 nicer to read. Additional examples of including multiple conditions with shared conditions, or conditions based on architecture variants can be found in this GitHub Gist page. Comparing this example to the initial example of conditions between stages, the new pattern isn’t limited to just controlling the platform differences of your builds but can be used with any build-arg. If you have used this pattern before, then you can effectively now define an “else” clause, whereas previously, you were limited to only “if” clauses. Copy with keeping parent directories The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature. #syntax=docker/dockerfile:1.7-labs When you are copying files in your Dockerfile, for example, do this: COPY app/file /to/dest/dir/ This example means the source file is copied directly to the destination directory. If your source path was a directory, all the files inside that directory would be copied directly to the destination path. What if you have a file structure like the following: . ├── app1 │ ├── docs │ │ └── manual.md │ └── src │ └── server.go └── app2 └── src └── client.go You want to copy only files in app1/src, but so that the final files at the destination would be /to/dest/dir/app1/src/server.go and not just /to/dest/dir/server.go. With the new COPY --parents flag, you can write: COPY --parents /app1/src/ /to/dest/dir/ This will copy the files inside the src directory and recreate the app1/src directory structure for these files. Things get more powerful when you start to use wildcard paths. To copy the src directories for both apps into their respective locations, you can write: COPY --parents */src/ /to/dest/dir/ This will create both /to/dest/dir/app1 and /to/dest/dir/app2, but it will not copy the docs directory. Previously, this kind of copy was not possible with a single command. You would have needed multiple copies for individual files (as shown in this example) or used some workaround with the RUN --mount instruction instead. You can also use double-star wildcard (**) to match files under any directory structure. For example, to copy only the Go source code files anywhere in your build context, you can write: COPY --parents **/*.go /to/dest/dir/ If you are thinking about why you would need to copy specific files instead of just using COPY ./ to copy all files, remember that your build cache gets invalidated when you include new files in your build. If you copy all files, the cache gets invalidated when any file is added or changed, whereas if you copy only Go files, only changes in these files influence the cache. The new --parents flag is not only for COPY instructions from your build context, but obviously, you can also use them in multi-stage builds when copying files between stages using COPY --from. Note that with COPY --from syntax, all source paths are expected to be absolute, meaning that if the --parents flag is used with such paths, they will be fully replicated as they were in the source stage. That may not always be desirable, and instead, you may want to keep some parents but discard and replace others. In that case, you can use a special /./ relative pivot point in your source path to mark which parents you wish to copy and which should be ignored. This special path component resembles how rsync works with the --relative flag. #syntax=docker/dockerfile:1.7-labs FROM ... AS base RUN ./generate-lot-of-files -o /out/ # /out/usr/bin/foo # /out/usr/lib/bar.so # /out/usr/local/bin/baz FROM scratch COPY --from=base --parents /out/./**/bin/ / # /usr/bin/foo # /usr/local/bin/baz This example above shows how only bin directories are copied from the collection of files that the intermediate stage generated, but all the directories will keep their paths relative to the out directory. Exclusion filters The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature: #syntax=docker/dockerfile:1.7-labs Another related case when moving files in your Dockerfile with COPY and ADD instructions is when you want to move a group of files but exclude a specific subset. Previously, your only options were to use RUN --mount or try to define your excluded files inside a .dockerignore file. .dockerignore files, however, are not a good solution for this problem, because they only list the files excluded from the client-side build context and not from builds from remote Git/HTTP URLs and are limited to one per Dockerfile. You should use them similarly to .gitignore to mark files that are never part of your project but not as a way to define your application-specific build logic. With the new --exclude=[pattern] flag, you can now define such exclusion filters for your COPY and ADD commands directly in the Dockerfile. The pattern uses the same format as .dockerignore. The following example copies all the files in a directory except Markdown files: COPY --exclude=*.md app /dest/ You can use the flag multiple times to add multiple filters. The next example excludes Markdown files and also a file called README: COPY --exclude=*.md --exclude=README app /dest/ Double-star wildcards exclude not only Markdown files in the copied directory but also in any subdirectory: COPY --exclude=**/*.md app /dest/ As in .dockerignore files, you can also define exceptions to the exclusions with ! prefix. The following example excludes all Markdown files in any copied directory, except if the file is called important.md — in that case, it is still copied. COPY --exclude=**/*.md --exclude=!**/important.md app /dest/ This double negative may be confusing initially, but note that this is a reversal of the previous exclude rule, and “include patterns” are defined by the source parameter of the COPY instruction. When using --exclude together with previously described --parents copy mode, note that the exclude patterns are relative to the copied parent directories or to the pivot point /./ if one is defined. See the following directory structure for example: assets ├── app1 │ ├── icons32x32 │ ├── icons64x64 │ ├── notes │ └── backup ├── app2 │ └── icons32x32 └── testapp └── icons32x32 COPY --parents --exclude=testapp assets/./**/icons* /dest/ This command would create the directory structure below. Note that only directories with the icons prefix were copied, the root parent directory assets was skipped as it was before the relative pivot point, and additionally, testapp was not copied as it was defined with an exclusion filter. dest ├── app1 │ ├── icons32x32 │ └── icons64x64 └── app2 └── icons32x32 Conclusion We hope this post gave you ideas for improving your Dockerfiles and that the patterns shown here will help you describe your build more efficiently. Remember that your Dockerfile can start using all these features today by defining the #syntax line on top, even if you haven’t updated to the latest Docker yet. For a full list of other features in the new BuildKit, Buildx, and Dockerfile releases, check out the changelogs: BuildKit v0.13.0 Buildx v0.13.0 Dockerfile v1.7.0 v1.7.0-labs Thanks to community members @tstenner, @DYefimov, and @leandrosansilva for helping to implement these features! If you have issues or suggestions you want to share, let us know in the issue tracker. Learn more Subscribe to the Docker Newsletter. Get the latest release of Docker Desktop. Vote on what’s next! Check out our public roadmap. Have questions? The Docker community is here to help. New to Docker? Get started. View the full article
  2. With the advent of containerization, software developers could build, deploy, and scale applications in a transformative way. Docker quickly became the leading containerization platform and remains one of the most common container images used today. Docker Hub is one of the prominent locations for developers and enterprises to publish and distribute docker images. With popularity comes greater attention from cybercriminals. Cybernews recently reported 5,500 out of 10,000 public docker images contained 48,000+ sensitive secrets - a combination of harmless and potentially vulnerable API keys. This report illustrates why it's imperative that security and platform teams know the most common attack vectors for their Docker containers and understand how to close them. This post will provide a brief checklist of the various attack vectors into your Docker containers specifically originating from exposed secrets. Docker and exposed secrets Let’s quickly examine the relationship between container runtime and registry. When we spin-up a container, an image is pulled from the registry via APIs and is deployed. This is visualized below: [Image source](https://community.sap.com/legacyfs/online/storage/blogattachments/2022/09/1-83.png)_ The high number of secrets from the Cybernews report is attributed to developers re-using packages from a registry containing sensitive secrets. Secrets are commonly found in the container image metadata - the environment variables and filesystem. Also, source code leakage could allow attackers to generate newer valid tokens that could provide unauthorized system access. Attack surface An attack surface is a collection of all vulnerable points an attacker can use to enter the target system. Attackers skillfully exploit these vulnerable points in technology and human behavior to access sensitive assets. We need to understand two Docker concepts as we continue this discussion: Filesystem: In Docker, each layer can contain directory changes. The most commonly used filesystem, OverlayFS, enables Docker to overlay these layers to create a unified filesystem for a container. Layers: Docker images are created in layers - i.e., each command on the DockerFile corresponds to a layer. With that context, let’s understand and analyze how exposed secrets can affect these Docker image attack vectors. Docker image layers Secrets explicitly declared in the Dockerfile or build arguments can easily be accessed via the Docker image history command. #terminal docker image history This represents one of the simplest methods for an attacker to capitalize on a secret. Filesystem This Dockerfile demonstrates a scenario where sensitive data like SSH private key and secrets.txt are added to the container's filesystem and later removed. #Dockerfile FROM nginx:latest # Copy in SSH private key, then delete it; this is INSECURE, # the secret will still be in the image. COPY id_rsa . RUN rm -r id_rsa ARG DB_USERNAME ENV DB_USERNAME =$DB_USERNAME ARG DB_PASSWORD ENV DB_PASSWORD =$DB_PASSWORD ARG API_KEY ENV API_KEY =$API_KEY # Expose secrets via a publicly accessible endpoint (insecure practice) RUN echo "DB_USERNAME=$DB_USERNAME" > /usr/share/nginx/html/secrets.txt RUN echo "DB_PASSWORD=$DB_PASSWORD" >> /usr/share/nginx/html/secrets.txt RUN echo "API_KEY=$API_KEY" >> /usr/share/nginx/html/secrets.txt RUN rm /usr/share/nginx/html/secrets.txt CMD ["nginx", "-g", "daemon off;"] Docker uses layer caching - hence, the secret is still available in one of the layers. An internal attacker can also extract individual layers of a Docker image, stored as tar files in registries, which enables them to uncover hidden secrets. After creating a Dockerfile, developers mistakenly use build arguments to create an image. For the above dockerfile, the secrets are input as arguments. #terminal docker build \ --build-arg DB_USERNAME=root \ --build-arg DB_PASSWORD=Xnkfnbgf \ --build-arg API_KEY=PvL4FjrrSXyT7qr \ -t myapp:1.0 .While convenient, it is not secure since the arguments also get embedded in the image. A simple docker history --no-trunc &lt;image> can expose the secret values. Developers should either use multi-stage builds or secret managers. Environment variables Apart from Docker image access, unauthorized access to the source code of the docker image can provide additional attack vectors. The .env files are primarily used to store secrets such as API tokens, database credentials, and other forms of secrets that an application needs. When attackers have access to secrets in the .env, they can make unauthorized accesses that the secret allows. Dockerfile DockerFile is a standard file that contains execution instructions to build an image while spinning up containers. Hard-coding secrets into DockerFile creates a significant attack surface. When attackers access the DockerFile, they can see hard-coded secrets, the base image, the list of dependencies, and critical file locations. Developers need to use appropriate secret managers to reference variables. Docker-compose.yml  Docker-compose defines networks, services, and storage volumes. When an attacker views the file, they can understand the application architecture and exploit or disrupt its operation. services: web: image: my-web-app:latest ports: - "80:80" networks: - app-network db: image: postgres:latest volumes: - db-data:/var/lib/postgresql/data environment: POSTGRES_PASSWORD: example_db_password networks: app-network: volumes: db-data:In the above docker-compose.yml, the postgres database password is hardcoded. The password can easily be accessed with the docker exec command as shown below: #terminal docker exec -it /bin/bash envApart from secrets, an attacker can also analyze the volume mappings and identify potential points of weakness. If they discover that the database volume (db-data) is also mounted to the host filesystem, they could exploit and perform a container breakout attack, gaining access to the underlying host system. CI/CD config files CI/CD configuration files such as .gitlab-ci.yml, Azure-pipelines.yml , Jenkinsfile,  etc., contain instructions for building, testing, and deploying applications. The logs generated in CI/CD pipeline can contain debugging and logging information. If a developer includes a logging statement that inadvertently prints a sensitive secret, it can lead to unauthorized exposure and compromise. Such secret leaks need to be detected so that developers can fix their source code. Developers also tend to leave the registry login credentials in the CI CD configuration files. Consider the following gitlab-ci.yml. variables: DOCKER_IMAGE_TAG: latest DOCKER_REGISTRY_URL: registry.example.com DOCKER_IMAGE_NAME: my-docker-image DOCKER_REGISTRY_USER: adminuser <-- should use $CI_REGISTRY_USER DOCKER_REGISTRY_PASSWORD: secretpassword <-- should use $CI_REGISTRY_PASSWORD # Jobs build: stage: build image: image_name:stable script: - docker build -t $DOCKER_REGISTRY_URL/$DOCKER_IMAGE_NAME:$DOCKER_IMAGE_TAG . - docker login -u $DOCKER_REGISTRY_USER -p $DOCKER_REGISTRY_PASSWORD $DOCKER_REGISTRY_URL - docker push $DOCKER_REGISTRY_URL/$DOCKER_IMAGE_NAME:$DOCKER_IMAGE_TAGIn the configuration above, the developer makes a docker registry login using a hardcoded username and password, leading to unwarranted secret exposure. A good development practice is to integrate CI/CD environment with secret managers like Hashicorp Vault. Detecting secrets Older or unused Docker images can contain unpatched vulnerabilities or outdated dependencies, posing security risks to the enterprise. Regularly scanning and removing unused images helps mitigate these risks by reducing the attack surface and ensuring that only secure images are deployed. Enterprises also need to be actively using secret scanners to detect secrets in docker images, whether they’re stored in Docker Hub, JFrog Artifactory, AWS ECR, or any other repository. HCP Vault Radar can meet these requirements and is an excellent choice since it's an add-on to the most popular secrets manager: HashiCorp Vault. Vault Radar analyzes the contents of each layer described in this post to identify secrets in the software packages and dependencies. Learn more Vault Radar can scan your container images and other destinations such as source code, productivity applications like Jira, Confluence, Slack, Terraform variables, server directories, and more. When it detects leaked secrets, it has options to remediate them and enhance your security posture. You can sign up now for an HCP Vault Radar test run to detect secret sprawl in your enterprise, or learn more about HashiCorp Vault and Vault Radar on our homepage. This post was originally published on Dev.to. View the full article
  3. Docker Official Images are a curated set of Docker repositories hosted on Docker Hub that provide a wide range of pre-configured images for popular language runtimes and frameworks, cloud-first utilities, data stores, and Linux distributions. These images are maintained and vetted, ensuring they meet best practices for security, usability, and versioning, making it easier for developers to deploy and run applications consistently across different environments. Docker Official Images are an important component of Docker’s commitment to the security of both the software supply chain and open source software. Docker Official Images provide thousands of images you can use directly or as a base image when building your own images. For example, there are Docker Official Images for Alpine Linux, NGINX, Ubuntu, PostgreSQL, Python, and Node.js. Visit Docker Hub to search through the currently available Docker Official Images. In this blog post, we address three common misconceptions about Docker Official Images and outline seven ways they help secure the software supply chain. 3 common misconceptions about Docker Official Images Even though Docker Official Images have been around for more than a decade and have been used billions of times, they are somewhat misunderstood. Who “owns” Docker Official Images? What is with all those tags? How should you use Docker Official Images? Let’s address some of the more common misconceptions. Misconception 1: Docker Official Images are controlled by Docker Docker Official Images are maintained through a partnership between upstream maintainers, community volunteers, and Docker engineers. External developers maintain the majority of Docker Official Images Dockerfiles, with Docker engineers providing insight and review to ensure best practices and uniformity across the Docker Official Images catalog. Additionally, Docker provides and maintains the Docker Official Images build infrastructure and logic, ensuring consistent and secure build environments that allow Docker Official Images to support more than 10 architecture/operating system combinations. Misconception 2: Docker Official Images are designed for a single use case Most Docker Official Images repositories offer several image variants and maintain multiple supported versions. In other words, the latest tag of a Docker Official Image might not be the right choice for your use case. Docker Official Images tags The documentation for each Docker Official Images repository contains a “Supported tags and respective Dockerfile links” section that lists all the current tags with links to the Dockerfiles that created the image with those tags (Figure 1). This section can be a little intimidating for first-time users, but keeping in mind a few conventions will allow even novices to understand what image variants are available and, more importantly, which variant best fits their use case. Figure 1: Documentation showing the current tags with links to the Dockerfiles that created the image with those tags. Tags listed on the same line all refer to the same underlying image. (Multiple tags can point to the same image.) For example, Figure 1 shows the ubuntu Docker Official Images repository, where the 20.04, focal-20240216, and focal tags all refer to the same image. Often the latest tag for a Docker Official Images repository is optimized for ease of use and includes a wide variety of software helpful, but not strictly necessary, when using the main software packaged in the Docker Official Image. For example, latest images often include tools like Git and build tools. Because of their ease of use and wide applicability, latest images are often used in getting-started guides. Some operating system and language runtime repositories offer “slim” variants that have fewer packages installed and are therefore smaller. For example, the python:3.12.2-bookworm image contains not only the Python runtime, but also any tool you might need to build and package your Python application — more than 570 packages! Compare this to the python:3.12.2-slim-bookworm image, which has about 150 packages. Many Docker Official Images repositories offer “alpine” variants built on top of the Alpine Linux distribution rather than Debian or Ubuntu. Alpine Linux is focused on providing a small, simple, and secure base for container images, and Docker Official Images alpine variants typically aim to install only necessary packages. As a result, Docker Official Images alpine variants are typically even smaller than “slim” variants. For example, the linux/amd64 node:latest image is 382 MB, the node:slim image is 70 MB, and the node:alpine image is 47 MB. If you see tags with words that look like Toy Story characters (for example, bookworm, bullseye, and trixie) or adjectives (such as jammy, focal, and bionic), those indicate the codename of the Linux distribution they use as a base image. Debian-release codenames are based on Toy Story characters, and Ubuntu releases use alliterative adjective-animal appellations. Linux distribution indicators are helpful because many Docker Official Images provide variants built upon multiple underlying distribution versions (for example, postgres:bookworm and postgres:bullseye). Tags may contain other hints to the purpose of their image variant. Often these are explained later in the Docker Official Images repository documentation. Check the “How to use this image” and/or “Image Variants” sections. Misconception 3: Docker Official Images do not follow software development best practices Some critics argue that Docker Official Images go against the grain of best practices, such as not running container processes as root. While it’s true that we encourage users to embrace a few opinionated standards, we also recognize that different use cases require different approaches. For example, some use cases may require elevated privileges for their workloads, and we provide options for them to do so securely. 7 ways Docker Official Images help secure the software supply chain We recognize that security is a continuous process, and we’re committed to providing the best possible experience for our users. Since the company’s inception in 2013, Docker has been a leader in the software supply chain, and our commitment to security — including open source security — has helped to protect developers from emerging threats all along the way. With the availability of open source software, efficiently building powerful applications and services is easier than ever. The transparency of open source allows unprecedented insight into the security posture of the software you create. But to take advantage of the power and transparency of open source software, fully embracing software supply chain security is imperative. A few ways Docker Official Images help developers build a more secure software supply chain include: Open build process Because visibility is an important aspect of the software supply chain, Docker Official Images are created from a transparent and open build process. The Dockerfile inputs and build scripts are all open source, all Docker Official Images updates go through a public pull request process, and the logs from all Docker Official Images builds are available to inspect (Jenkins / GitHub Actions). Principle of least privilege The Docker Official Images build system adheres strictly to the principle of least privilege (POLP), for example, by restricting writes for each architecture to architecture-specific build agents. Updated build system Ensuring the security of Docker Official Images builds and images is paramount. The Docker Official Images build system is kept up to date through automated builds, regular security audits, collaboration with upstream projects, ongoing testing, and security patches. Vulnerability reports and continuous monitoring Courtesy of Docker Scout, vulnerability insights are available for all Docker Official Images and are continuously updated as new vulnerabilities are discovered. We are committed to continuously monitoring our images for security issues and addressing them promptly. For example, we were among the first to provide reasoned guidance and remediation for the recent xz supply chain attack. We also use insights and remediation guidance from Docker Scout, which surfaces actionable insights in near-real-time by updating CVE results from 20+ CVE databases every 20-60 minutes. Software Bill of Materials (SBOM) and provenance attestations We are committed to providing a complete and accurate SBOM and detailed build provenance as signed attestations for all Docker Official Images. This allows our users to have confidence in the origin of Docker Official Images and easily identify and mitigate any potential vulnerabilities. Signature validation We are working on integrating signature validation into our image pull and build processes. This will ensure that all Docker Official Images are verified before use, providing an additional layer of security for our users. Increased update frequency Docker Official Images provide the best of both worlds: the latest version of the software you want, built upon stable versions of Linux distributions. This allows you to use the latest features and fixes of the software you are running without having to wait for a new package from your Linux distribution or being forced to use an unstable version of your Linux distribution. Further, we are working to increase the throughput of the Docker Official Images build infrastructure to allow us to support more frequent updates for larger swaths of Docker Official Images. As part of this effort, we are piloting builds on GitHub Actions and Docker Build Cloud. Conclusion Docker’s leadership in security and protecting open source software has been established through Docker Official Images and other trusted content we provide our customers. We take a comprehensive approach to security, focusing on best practices, tooling, and community engagement, and we work closely with upstream projects and SIGs to address security issues promptly and proactively. Docker Official Images provide a flexible and secure way for developers to build, ship, test, and run their applications. Docker Official Images are maintained through a partnership between the Docker Official Images community, upstream maintainers/volunteers, and Docker engineers, ensuring best practices and uniformity across the Docker Official Images catalog. Each Docker Official Image offers numerous image variants that cater to different use cases, with tags indicating the purpose of each variant. Developers can build using Docker tools and products with confidence, knowing that their applications are built on a secure, transparent foundation. Looking to dive in? Get started building with Docker Official Images today. Learn more Browse the available Docker Official Images. Visit hub.docker.com and start developing. Find Docker Official Images on GitHub. Learn more about Docker Hub. Read: Debian’s Dedication to Security: A Robust Foundation for Docker Developers How to Use the Postgres Docker Official Image How to Use the NGINX Docker Official Image How to Use the Alpine Docker Official Image How to Use the Node Docker Official Image View the full article
  4. In the current era, Docker is an indispensable tool for developers to improve productivity. Docker is an application that allows packaging and running applications in an isolated environment. The isolated environment is the container; you can have multiple containers in one host. This post guides you on steps to follow to get Docker installed on Ubuntu 24.04 quickly. How to Install Docker on Ubuntu 24.04 (Noble Numbat) Installing Docker on Ubuntu 24.04 is easy. You only need access to a user account with admin privileges and connected to the internet. Again, the steps to follow will differ depending on your installation method. In this case, we have two methods of installing Docker on Ubuntu 24.04. Let’s discuss each in detail. Method 1: Install Docker from Its Official Repository There are numerous benefits to installing the latest stable Docker version, including access to new features. For someone looking to have the latest Docker version installed, you must access it from the official Docker repository. However, this method requires running more commands than the second method in the next section. Nonetheless, let’s go through the process step-by-step. Step 1: Update the Repository To ensure we prepare our system to retrieve the latest packages, run the below command to update the repository. $ sudo apt update You will be required to authenticate the process by adding your root password. Step 2: Install Prerequisites Before installing Docker, other prerequisite packages must be installed. For instance, we need the curl utility to download the GPG key. The below command handles the installation of all the prerequisite packages. $ sudo apt install apt-transport-https ca-certificates curl software-properties-common Step 3: Add Docker’s GPG Key Using curl, we must add the Docker repository GPG key. Doing so ensures that we can use the key to check the authenticity of the software package before installing it. Add it using the following command. $ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg Step 4: Include the Docker Repository in your APT Sources When you run the install command, Ubuntu checks the sources list to fetch a package. Thus, we must add Docker’s repository to the system’s source list with the below command. $ echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null After adding the Docker repository, run the update command to refresh the sources list. $ sudo apt update Step 5: Verify the Installation Source As a last step before installing Docker, use the below command to specify that the system should source the Docker repository that we’ve added and not the one available on the Ubuntu repository. This way, you will access the latest Docker version. $ apt-cache policy docker.ce From the output, you will see the latest available version for your system. Step 6: Install Docker At this point, we can install Docker from the official repository by running the below command. $ sudo apt install docker-ce -y Step 7: Verify the Installed Docker One way of ascertaining that we’ve successfully installed Docker on Ubuntu 24.04 is to check its status using systemctl. Use the following command. $ sudo systemctl status docker Another way to check that the installation succeeded is by running the hello-image. Docker offers the image to ascertain that the installation was completed successfully. Running the command will pull the image and run the test. Here’s the command to run. $ sudo Docker run hello-world Go ahead and have fun using Docker! Method 2: Install Docker from the Ubuntu Repository Docker is also available from the official Ubuntu 24.04 repository. This option is the easy way to install Docker, but you won’t get the latest version. Nonetheless, you still manage to get Docker. Proceed as follows. Step 1: Update Ubuntu’s Repository Similar to the previous method, we must update the Ubuntu repository before installing Docker. $ sudo apt update Step 2: Fetch and Install Docker After the update, we can install Docker using the command below. $ sudo apt-get install docker.io -y Allow the installation to compete. Step 3: Install Docker Dependencies Although we’ve managed to install Docker, some dependency packages should be installed. Instead of installing them separately using APT, a better way is to install Docker as a snap package. Doing so will install all the Docker dependencies when installing the snap package. Run the snap install command below. $ sudo snap install Docker Bingo! You’ve installed Docker on Ubuntu 24.04 from the Ubuntu official repository. You can check the Docker version to verify that it is installed and ready for use. Conclusion Docker is a new and reliable way of packaging and running applications in containers. The benefits of using Docker are numerous for a developer, and it all starts with knowing how to install it. This post gave a step-by-step process for installing Docker on Ubuntu 24.04. Hopefully, you’ve managed to get Docker up and running. View the full article
  5. During the past decade, containers have revolutionized software development by introducing higher levels of consistency and scalability. Now, developers can work without the challenges of dependency management, environment consistency, and collaborative workflows. When developers explore containerization, they might learn about container internals, architecture, and how everything fits together. And, eventually, they may find themselves wondering about the differences between containerd and Docker and how they relate to one another. In this blog post, we’ll explain what containerd is, how Docker and containerd work together, and how their combined strengths can improve developer experience. What’s a container? Before diving into what containerd is, I should briefly review what containers are. Simply put, containers are processes with added isolation and resource management. Containers have their own virtualized operating system with access to host system resources. Containers also use operating system kernel features. They use namespaces to provide isolation and cgroups to limit and monitor resources like CPU, memory, and network bandwidth. As you can imagine, container internals are complex, and not everyone has the time or energy to become an expert in the low-level bits. This is where container runtimes, like containerd, can help. What’s containerd? In short, containerd is a runtime built to run containers. This open source tool builds on top of operating system kernel features and improves container management with an abstraction layer, which manages namespaces, cgroups, union file systems, networking capabilities, and more. This way, developers don’t have to handle the complexities directly. In March 2017, Docker pulled its core container runtime into a standalone project called containerd and donated it to the Cloud Native Computing Foundation (CNCF). By February 2019, containerd had reached the Graduated maturity level within the CNCF, representing its significant development, adoption, and community support. Today, developers recognize containerd as an industry-standard container runtime known for its scalability, performance, and stability. Containerd is a high-level container runtime with many use cases. It’s perfect for handling container workloads across small-scale deployments, but it’s also well-suited for large, enterprise-level environments (including Kubernetes). A key component of containerd’s robustness is its default use of Open Container Initiative (OCI)-compliant runtimes. By using runtimes such as runc (a lower-level container runtime), containerd ensures standardization and interoperability in containerized environments. It also efficiently deals with core operations in the container life cycle, including creating, starting, and stopping containers. How is containerd related to Docker? But how is containerd related to Docker? To answer this, let’s take a high-level look at Docker’s architecture (Figure 1). Containerd facilitates operations on containers by directly interfacing with your operating system. The Docker Engine sits on top of containerd and provides additional functionality and developer experience enhancements. How Docker interacts with containerd To better understand this interaction, let’s talk about what happens when you run the docker run command: After you select enter, the Docker CLI will send the run command and any command-line arguments to the Docker daemon (dockerd) via REST API call. dockerd will parse and validate the request, and then it will check that things like container images are available locally. If they’re not, it will pull the image from the specified registry. Once the image is ready, dockerd will shift control to containerd to create the container from the image. Next, containerd will set up the container environment. This process includes tasks such as setting up the container file system, networking interfaces, and other isolation features. containerd will then delegate running the container to runc using a shim process. This will create and start the container. Finally, once the container is running, containerd will monitor the container status and manage the lifecycle accordingly. Docker and containerd: Better together Docker has played a key role in the creation and adoption of containerd, from its inception to its donation to the CNCF and beyond. This involvement helped standardize container runtimes and bolster the open source community’s involvement in containerd’s development. Docker continues to support the evolution of the open source container ecosystem by continuously maintaining and evolving containerd. Containerd specializes in the core functionality of running containers. It’s a great choice for developers needing access to lower-level container internals and other advanced features. Docker builds on containerd to create a cohesive developer experience and comprehensive toolchain for building, running, testing, verifying, and sharing containers. Build + Run In development environments, tools like Docker Desktop, Docker CLI, and Docker Compose allow developers to easily define, build, and run single or multi-container environments and seamlessly integrate with your favorite editors or IDEs or even in your CI/CD pipeline. Test One of the largest developer experience pain points involves testing and environment consistency. With Testcontainers, developers don’t have to worry about reproducibility across environments (for example, dev, staging, testing, and production). Testcontainers also allows developers to use containers for isolated dependency management, parallel testing, and simplified CI/CD integration. Verify By analyzing your container images and creating a software bill of materials (SBOM), Docker Scout works with Docker Desktop, Docker Hub, or Docker CLI to help organizations shift left. It also empowers developers to find and fix software vulnerabilities in container images, ensuring a secure software supply chain. Share Docker Registry serves as a store for developers to push container images to a shared repository securely. This functionality streamlines image sharing, making maintaining consistency and efficiency in development and deployment workflows easier. With Docker building on top of containerd, the software development lifecycle benefits from the inner loop and testing to secure deployment to production. Wrapping up In this article, we discussed the relationship between Docker and containerd. We showed how containers, as isolated processes, leverage operating system features to provide efficient and scalable development and deployment solutions. We also described what containerd is and explained how Docker leverages containerd in its stack. Docker builds upon containerd to enhance the developer experience, offering a comprehensive suite of tools for the entire development lifecycle across building, running, verifying, sharing, and testing containers. Start your next projects with containerd and other container components by checking out Docker’s open source projects and most popular open source tools. Learn more Subscribe to the Docker Newsletter. Get the latest release of Docker Desktop. Vote on what’s next! Check out our public roadmap. Have questions? The Docker community is here to help. New to Docker? Get started. View the full article
  6. At the heart of Docker's containerization process is the Dockerfile, a file that helps automate the creation of Docker images. In this blog post, we’ll take a detailed look at what a Dockerfile is and how it works. Let's get started! What is a Dockerfile?A Dockerfile is a text file that contains instructions on how to build a Docker image. Each instruction is composed of a command followed by one or more arguments. By convention, commands are written in uppercase to distinguish them from arguments and make the Dockerfile more readable. Here is an example Dockerfile for a Node.js application: FROM node:20.11.1 WORKDIR /app COPY package.json /app RUN npm install COPY . /app CMD ["node", "server.js"] Here are the sequential tasks that are executed when building a Docker image from this Dockerfile: Docker starts by looking for the base image specified in the FROM instruction (node:20.11.1) in the local cache. If it's not found locally, Docker fetches it from Docker Hub.Next, Docker creates a working directory inside the container's filesystem as specified by the WORKDIR instruction (/app).The COPY instruction copies package.json into the /app directory in the container. This is crucial for managing project dependencies.Docker then executes the RUN npm install command to install the dependencies defined in package.json.After the installation of dependencies, Docker copies the remaining project files into the /app directory with another COPY instruction. Finally, the CMD instruction sets the default command to run inside the container (node server.js), which starts the application.Want to learn more about building a Docker image using a Dockerfile? Check out this blog post: How to Build a Docker Image With Dockerfile From Scratch. Common Dockerfile InstructionsBelow, we discuss some of the most important commands commonly used in a Dockerfile: FROM: Specifies the base image for subsequent instructions. Every Dockerfile must start with a FROM command. ADD / COPY: Both commands enable the transfer of files from the host to the container’s filesystem. The ADD instruction is particularly useful when adding files from remote URLs or for the automatic extraction of compressed files from the local filesystem directly into the container's filesystem. Note that Docker recommends using COPY over ADD, especially when transferring local files. WORKDIR: Sets the working directory for any RUN, CMD, ENTRYPOINT, COPY, and ADD instructions that follow it in the Dockerfile. If the specified directory does not exist, it’s created automatically.RUN: Executes commands specified during the build step of the container. It can be used to install necessary packages, update existing packages, and create users and groups, among other system configuration tasks within the container.CMD / ENTRYPOINT: Both provide default commands to be executed when a Docker image is run as a container. The main distinction is that the argument passed to the ENTRYPOINT command cannot be overridden, while the argument passed to the CMD command can.For a comprehensive guide to all available Dockerfile instructions, refer to the official Docker documentation at Dockerfile reference. Relationship Between Dockerfile Instructions and Docker Image LayersEach instruction in a Dockerfile creates a new layer in the Docker image. These layers are stacked on top of each other, and each layer represents the change made from the layer below it. The most important point to note here is that Docker caches these layers to speed up subsequent builds (more on this in the next section). As a general rule, any Dockerfile command that modifies the file system (such as FROM, RUN, and COPY) creates a new layer. Commands instructing how to build the image and run the container (such as WORKDIR, ENV, and ENTRYPOINT) add zero-byte-sized metadata layers to the created image. To view the commands that create the image layers and the sizes they contribute to the Docker image, you can run the following command: docker history <IMAGE_NAME> You can also run the following command to find out the number of image layers: docker inspect --format '{{json .RootFS.Layers}}' <IMAGE_NAME> In this command, we use a Go template to extract the layers’ information. For a deep dive into Docker image layers, check out our blog post: What Are Docker Image Layers and How Do They Work? Dockerfile and Build CacheWhen you build a Docker image using the Dockerfile, Docker checks each instruction (layer) against its build cache. If a layer has not changed (meaning the instruction and its context are identical to a previous build), Docker uses the cached layer instead of executing the instruction again. Let’s see this in action. Below is the output we get from building a sample Node app using the Dockerfile in the previous section: From the screenshot above, the build process took 1244.2 seconds. Building another Docker image (without making any changes to the application code or Dockerfile), the build time is drastically reduced to just 6.9 seconds, as shown below: The significant decrease in build time for the second build demonstrates Docker's effective use of the build cache. Since there were no alterations in the Dockerfile instructions or the application code, Docker used the cached layers from the first build. One more important point to note is that caching has a cascading effect. Once an instruction is modified, all subsequent instructions, even if unchanged, will be executed afresh because Docker can no longer guarantee their outcomes are the same as before. This characteristic of Docker's caching mechanism has significant implications for the organization of instructions within a Dockerfile. In the upcoming section on Dockerfile best practices, we'll learn how to strategically order Dockerfile instructions to optimize build times. Best Practices for Writing DockerfilesBelow, we discuss three recommended best practices you should follow when writing Dockerfiles: #1 Use a .dockerignore fileWhen writing Dockerfiles, ensure that only the files and folders required for your application are copied to the container’s filesystem. To help with this, create a .dockerignore file in the same directory as your Dockerfile. In this file, list all the files and directories that are unnecessary for building and running your application—similar to how you would use a .gitignore file to exclude files from a git repository. Not including irrelevant files in the Docker build context helps to keep the image size small. Smaller images bring significant advantages: they require less time and bandwidth to download, occupy less storage space on disk, and consume less memory when loaded into a Docker container. #2 Keep the number of image layers relatively smallAnother best practice to follow while writing Dockerfiles is to keep the number of image layers as low as possible, as this directly impacts the startup time of the container. But how can we effectively reduce the number of image layers? A simple method is to consolidate multiple RUN commands into a single command. Let’s say we have a Dockerfile that contains three separate commands like these: RUN apt-get update RUN apt-get install -y nginx RUN apt-get clean This will result in three separate layers. However, by merging these commands into one, as shown below, we can reduce the number of layers from three to one. RUN apt-get update && \ apt-get install -y nginx && \ apt-get clean In this version, we use the && operator along with the \ for line continuation. The && operator executes commands sequentially, ensuring that each command is run only if the previous one succeeds. This approach is critical for maintaining the build's integrity by stopping the build if any command fails, thus preventing the creation of a defective image. The \ aids in breaking up long commands into more readable segments. #3 Order Dockerfile instructions to leverage caching as much as possibleWe know that Docker uses the build cache to try to avoid rebuilding any image layers that it has already built and that do not contain any noticeable changes. Due to this caching strategy, the order in which you organize instructions within your Dockerfile is important in determining the average duration of your build processes. The best practice is to place instructions that are least likely to change towards the beginning and those that change more frequently towards the end of the Dockerfile. This strategy is grounded in how Docker rebuilds images: Docker checks each instruction in sequence against its cache. If it encounters a change in an instruction, it cannot use the cache for this and all subsequent instructions. Instead, Docker rebuilds each layer from the point of change onwards. Consider the Dockerfile below: FROM node:20.11.1 WORKDIR /app COPY . /app RUN npm install CMD ["node", "server.js"]It works fine, but there is an issue. On line 3, we copy the entire directory (including the application code) into the container. Following this, on line 4, we install the dependencies. This setup has a significant drawback: any modifications to the application code lead to the invalidation of the cache starting from this point. As a result, dependencies are reinstalled with each build. This process is not only time-consuming but also unnecessary, considering that dependency updates occur less frequently than changes to the application code. To better leverage Docker's cache, we can adjust our approach by initially copying only the package.json file to install dependencies, followed by copying the rest of the application code: FROM node:20.11.1 WORKDIR /app COPY package.json /app RUN npm install COPY . /app CMD ["node", "server.js"] This modification means that changes to the application code now only affect the cache from line 5 onwards. The installation of dependencies, happening before this, benefits from cache retention (unless there are changes to package.json), thus optimizing the build time. ConclusionIn this blog post, we began by defining what a Dockerfile is, followed by a discussion of the most frequently used commands within a Dockerfile. We then explored the relationship between Dockerfile instructions and Docker image layers, as well as the concept of the build cache and how Docker employs it to improve build times. Lastly, we outlined three recommended best practices for writing Dockerfiles. With these insights, you now have the knowledge required to write efficient Dockerfiles. Interested in learning more about Docker? Check out the following courses from KodeKloud: Docker for the Absolute BeginnerDocker Certified Associate Exam CourseView the full article
  7. One question that often arises for both newcomers and seasoned Docker users alike is: "Where are Docker images stored?" Understanding the storage location of Docker images is crucial for managing disk space and optimizing Docker performance. Whether you're running Docker on macOS, Windows, or Linux, this blog post will guide you on how to locate Docker images on each platform. Let’s get started! Where Docker Images are Stored on macOSThe most important point to keep in mind is that Docker doesn't run natively on macOS. Instead, it operates within a virtual machine using HyperKit. For Docker Desktop on macOS, Docker stores Docker images within a disk image file. But what exactly is a disk image file? In simple terms, a disk image file is a single, large file that acts like a virtual "hard drive" for the Docker virtual machine. It contains the entire file system used by Docker, including images, containers, and volumes. To locate this disk image file, follow the steps below: #1 Open Docker DesktopStart Docker Desktop on your Mac if it's not already running. #2 Open settings windowOnce Docker Desktop is open, click on the Settings icon (highlighted below) to open the Settings window. #3 Navigate to the "Advanced" tab In the Settings window, click on the Resources tab, and then click on the Advanced section. #4 Find disk image location Scroll down a bit in the Advanced section until you see Disk image location. Here, Docker Desktop displays the path to the disk image file on your Mac's file system, as highlighted below: Note that the disk image is managed by the Docker Desktop virtual machine and is not meant to be manually modified or directly accessed. Instead, you should interact with Docker images through Docker CLI commands. Where Docker Images are Stored on LinuxUnlike macOS, Docker runs natively on Linux. This means that Docker directly interacts with the Linux kernel and filesystem without the need for a virtual machine. On Linux, all Docker-related data, including images, containers, volumes, and networks, is stored within the /var/lib/docker directory, known as the root directory. The exact directory inside the root directory where images are stored depends on the storage driver being used. For example, if the storage driver is overlay2, the image-related data will be stored within the subdirectory of /var/lib/docker/overlay2. To find out what storage driver you’re using, you can run the docker info command and look for a field named Storage Driver as shown below: Note: While it's useful to know where Docker stores its image-related data, manually altering files or directories within /var/lib/docker is not recommended, as it can disrupt Docker's operations and lead to data loss. Where Docker Images are Stored on WindowsFor Docker Desktop on Windows configured to use WSL 2 (recommended by Docker), Docker stores Docker images in a disk image. To locate it, follow the steps below: #1 Open Docker DesktopStart Docker Desktop on your machine if it's not already running. #2 Open settings windowOnce Docker Desktop is open, click on the Settings icon (highlighted below) to open the Settings window. #3 Navigate to the "Advanced" tab In the Settings window, click on the Resources tab, and then click on the Advanced section. #4 Find disk image location Scroll down a bit in the Advanced section until you see Disk image location. Here, Docker Desktop displays the path to the disk image file on your Windows file system, as shown below: Cleaning up Images Used by DockerAs you use Docker, over time, your system might accumulate many unused and "dangling" images. An unused image is an image not currently used by any container (stopped or running). A dangling image is an image that neither has a repository name nor a tag. It appears in Docker image listings as <none>:<none>. These images can take up significant disk space, leading to inefficiencies and potential storage issues. To address this, Docker provides a simple command: docker image prune. To remove all dangling images, run the following command: docker image prune This command will issue a prompt asking whether you want to remove all dangling images. Type y and hit enter to proceed. To remove unused images, run the following command: docker image prune -aWhen you run this command, it’ll issue a warning and will prompt you to confirm whether you want to proceed or not. Type y and hit enter to proceed. You should regularly clean up unused and dangling Docker images to maintain an efficient and clutter-free Docker environment. ConclusionNow you know where Docker images are stored on three different platforms: Linux, macOS, and Windows. If you’re still unable to find the location where Docker images are stored, which may be due to a specific configuration not covered in this blog post, consider asking questions on platforms such as Stack Overflow and the Docker Community forum. A deep understanding of Docker images, including knowledge of how to create and manage them, is crucial for mastering Docker. To help you with this, check out the blog posts below: How to Build a Docker Image With Dockerfile From ScratchHow to Create Docker Image From a Container?How to Remove Unused and Dangling Docker Images?Why and How to Tag a Docker Image?What Are Docker Image Layers and How Do They Work?How to Run a Docker Image as a Container?Interested in learning more about Docker? Check out the following courses from KodeKloud: Docker for the Absolute Beginner: This course will help you understand Docker using lectures and demos. You’ll get a hands-on learning experience and coding exercises that will validate your Docker skills. Additionally, assignments will challenge you to apply your skills in real-life scenarios.Docker Certified Associate Exam Course: This course covers all the required topics from the Docker Certified Associate Exam curriculum. The course offers several opportunities for practice and self-assessment. There are hundreds of research questions in multiple-choice format, practice tests at the end of each section, and multiple mock exams that closely resemble the actual exam pattern.View the full article
  8. While using Docker Compose, one error that you might encounter is: "Docker Compose Not Found". This error might seem daunting at first glance; however, it usually points to a few common issues that are relatively straightforward to resolve. In this blog post, we'll explore three common scenarios that trigger this error and provide fixes for each. Let's dive in! #1 Wrong Docker Compose Command Line SyntaxOne reason you might encounter the 'Docker Compose command not found' error is due to the use of incorrect Docker Compose command line syntax. In Docker Compose V1, the command line syntax is docker-compose, with a hyphen (-) between 'docker' and 'compose'. However, for Docker Compose V2, the syntax depends on the installation method. As a plugin: When installed as a plugin, Docker Compose V2 follows the docker compose syntax, where commands are issued with a space instead of a hyphen. For example, to check the version, you would use docker compose version.As a standalone binary: If Docker Compose V2 is installed as a standalone binary, the command syntax shifts to using a hyphen (docker-compose), similar to V1's approach. For instance, if I install Docker Compose V1 and then use the command docker compose version to check the installed version—note that I'm using the V2 syntax—I receive a message stating: “docker: 'compose' is not a docker command,” as shown below: Conversely, after installing the Docker Compose Plugin (V2) and using the V1 command line syntax to check the Docker Compose version, I see the error: “Command 'docker-compose' not found,” as shown below: Solution:The fix is straightforward—check the version using both command line syntaxes, and continue with the syntax that successfully returns the Docker Compose version. Note: Compose V1 stopped receiving updates in July 2023 and is no longer included in new Docker Desktop releases. Compose V2 has taken its place and is now integrated into all current Docker Desktop versions. For more details, review the migration guide to Compose V2. #2 Docker Compose Not InstalledAnother reason you might see the "Command Docker Compose Not Found" error is because you don’t have Docker Compose installed. On macOS, Windows, and Linux, when you install Docker Desktop, Docker Compose comes bundled with it, so you don’t need to install it separately. However, the situation can differ on Linux. You may have installed Docker Engine and Docker CLI but not Docker Compose. Note: Although recent versions of Docker Engine for Linux have started to include Docker Compose as part of the Docker package (especially with the introduction of the Docker Compose V2), this isn't universally the case for all Linux distributions or installation methods. For example, on my Ubuntu system, I have Docker Engine and Docker CLI installed but not Docker Compose. So, if I check the Docker Compose version using the command docker compose version (Docker Compose V2 syntax), I get an error saying “docker: ‘compose’ is not a docker command,” as shown below: If I check the version using the Docker Compose V1 syntax (docker-compose --version), I get the error: “Command ‘docker-compose’ not found,” as shown below: But how can we be sure that this error is because we don’t have Docker Compose installed and not for some other reason? Well, if we don’t find any result when searching for the Docker Compose binary, this means Docker Compose has not been installed. You can run the command below to search your entire filesystem for the docker-compose binary: sudo find / -name docker-compose After running the command, if you don’t get any results, then Docker Compose is not installed on your system. Solution:The solution is straightforward—Install Docker Compose. You can find installation instructions for your specific Linux distribution here. #3 Incorrect Path ConfigurationAnother common reason behind the "Command Docker Compose Not Found" error could be an incorrect PATH configuration. The PATH environment variable helps your operating system locate executables. If Docker Compose is installed in a non-standard location but not properly added to your PATH, your terminal won't be able to find and execute Docker Compose commands. SolutionFirst, locate the installation directory of Docker Compose on your system using the following command: sudo find / -name docker-compose Once identified, add this directory to your system's PATH environment variable. This ensures that your system can recognize and execute Docker Compose commands from any directory. Note: Make sure that the path you add to the PATH variable points to the directory containing the docker-compose binary, not to the file itself. For example, if the full path to the docker-compose binary is /usr/local/bin/docker-compose, you should add /usr/local/bin to your PATH, not /usr/local/bin/docker-compose. ConclusionIn this blog post, we walked through three common causes behind the "Docker Compose Not Found" error and detailed the steps to resolve each. Now, you're equipped with the knowledge to troubleshoot this issue, whether it arises from incorrect Docker Compose command line syntax, missing Docker Compose installation, or incorrect PATH configuration. Want to learn how to view logs for a multi-container application deployed via Docker Compose, so that you can troubleshoot when applications don’t run as expected? Check out our blog post: Docker-Compose Logs: How to View Log Output? Interested in learning more about Docker? Check out the following courses from KodeKloud: Docker for the Absolute Beginner: This course will help you understand Docker using lectures and demos. You’ll get a hands-on learning experience and coding exercises that will validate your Docker skills. Additionally, assignments will challenge you to apply your skills in real-life scenarios.Docker Certified Associate Exam Course: This course includes all the topics covered by the Docker Certified Associate Exam curriculum. The course offers several opportunities for practice and self-assessment. There are hundreds of research questions in multiple-choice format, practice tests at the end of each section, and multiple mock exams that closely resemble the actual exam pattern.View the full article
  9. NCache Java Edition with distributed cache technique is a powerful tool that helps Java applications run faster, handle more users, and be more reliable. In today's world, where people expect apps to work quickly and without any problems, knowing how to use NCache Java Edition is very important. It's a key piece of technology for both developers and businesses who want to make sure their apps can give users fast access to data and a smooth experience. This makes NCache Java Edition an important part of making great apps. This article is made especially for beginners to make the ideas and steps of adding NCache to your Java applications clear and easy to understand. It doesn't matter if you've been developing for years or if you're new to caching, this article will help you get a good start with NCache Java Edition. Let’s start with a step-by-step process to set up a development workstation for NCache with the Java setup. View the full article
  10. Eleven years ago, Solomon Hykes walked onto the stage at PyCon 2013 and revealed Docker to the world for the first time. The problem Docker was looking to solve? “Shipping code to the server is hard.” And the world of application software development changed forever. Docker was built on the shoulders of giants of the Linux kernel, copy-on-write file systems, and developer-friendly git semantics. The result? Docker has fundamentally transformed how developers build, share, and run applications. By “dockerizing” an app and its dependencies into a standardized, open format, Docker dramatically lowered the friction between devs and ops, enabling devs to focus on their apps — what’s inside the container — and ops to focus on deploying any app, anywhere — what’s outside the container, in a standardized format. Furthermore, this standardized “unit of work” that abstracts the app from the underlying infrastructure enables an “inner loop” for developers of code, build, test, verify, and debug, which results in 13X more frequent releases of higher-quality, more secure updates. The subsequent energy over the past 11 years from the ecosystem of developers, community, open source maintainers, partners, and customers cannot be understated, and we are so thankful and appreciative of your support. This has shown up in many ways, including the following: Ranked #1 “most-wanted” tool/platform by Stack Overflow’s developer community for the past four years 26 million monthly active IPs accessing 15 million repos on Docker Hub, pulling them 25 billion times per month 17 million registered developers Moby project has 67.5k stars, 18.5k forks, and more than 2,200 contributors; Docker Compose has 32.1k stars and 5k forks A vibrant network of 70 Docker Captains across 25 countries serving 167 community meetup groups with more than 200k members and 4800 total meetups 79,000+ customers The next decade In our first decade, we changed how developers build, share, and run any app, anywhere — and we’re only accelerating in our second! Specifically, you’ll see us double down on meeting development teams where they are to enable them to rapidly ship high-quality, secure apps via the following focus areas: Dev Team Productivity. First, we’ll continue to help teams take advantage of the right tech for the right job — whether that’s Linux containers, Windows containers, serverless functions, and/or Wasm (Web Assembly) — with the tools they love and the skills they already possess. Second, by bringing together the best of local and the best of cloud, we’ll enable teams to discover and address issues even faster in the “inner loop,” as you’re already seeing today with our early efforts with Docker Scout, Docker Build Cloud, and Testcontainers Cloud. GenAI. This tech is ushering in a “golden age” for development teams, and we’re investing to help in two areas: First, our GenAI Stack — built through collaboration with partners Ollama, LangChain, and Neo4j — enables dev teams to quickly stand up secure, local GenAI-powered apps. Second, our Docker AI is uniquely informed by anonymized data from dev teams using Docker, which enables us to deliver automations that eliminate toil and reduce security risks. Software Supply Chain. The software supply chain is heterogeneous, extensive, and complex for dev teams to navigate and secure, and Docker will continue to help simplify, make more visible, and manage it end-to-end. Whether it’s the trusted content “building blocks” of Docker Official Images (DOI) in Docker Hub, the transformation of ingredients into runnable images via BuildKit, verifying and securing the dev environment with digital signatures and enhanced container isolation, consuming metadata feedback from running containers in production, or making the entire end-to-end supply chain visible and issues actionable in Docker Scout, Docker has it covered and helps make a more secure internet! Dialing it past 11 While our first decade was fantastic, there’s so much more we can do together as a community to serve app development teams, and we couldn’t be more excited as our second decade together gets underway and we dial it past 11! If you haven’t already, won’t you join us today?! How has Docker influenced your approach to software development? Share your experiences with the community and join the conversation on LinkedIn. Let’s build, share, and run — together! View the full article
  11. The domain of GenAI and LLMs has been democratized and tasks that were once purely in the domain of AI/ML developers must now be reasoned with by regular application developers into everyday products and business logic. This is leading to new products and services across banking, security, healthcare, and more with generative text, images, and videos. Moreover, GenAI’s potential economic impact is substantial, with estimates it could add trillions of dollars annually to the global economy. Docker offers an ideal way for developers to build, test, run, and deploy the NVIDIA AI Enterprise software platform — an end-to-end, cloud-native software platform that brings generative AI within reach for every business. The platform is available to use in Docker containers, deployable as microservices. This enables teams to focus on cutting-edge AI applications where performance isn’t just a goal — it’s a necessity. This week, at the NVIDIA GTC global AI conference, the latest release of NVIDIA AI Enterprise was announced, providing businesses with the tools and frameworks necessary to build and deploy custom generative AI models with NVIDIA AI foundation models, the NVIDIA NeMo framework, and the just-announced NVIDIA NIM inference microservices, which deliver enhanced performance and efficient runtime. This blog post summarizes some of the Docker resources available to customers today. Docker Hub Docker Hub is the world’s largest repository for container images with an extensive collection of AI/ML development-focused container images, including leading frameworks and tools such as PyTorch, TensorFlow, Langchain, Hugging Face, and Ollama. With more than 100 million pull requests for AI/ML-related images, Docker Hub’s significance to the developer community is self-evident. It not only simplifies the development of AI/ML applications but also democratizes innovation, making AI technologies accessible to developers across the globe. NVIDIA’s Docker Hub library offers a suite of container images that harness the power of accelerated computing, supplementing NVIDIA’s API catalog. Docker Hub’s vast audience — which includes approximately 27 million monthly active IPs, showcasing an impressive 47% year-over-year growth — can use these container images to enhance AI performance. Docker Hub’s extensive reach, underscored by an astounding 26 billion monthly image pulls, suggests immense potential for continued growth and innovation. Docker Desktop with NVIDIA AI Workbench Docker Desktop on Windows and Mac helps deliver NVIDIA AI Workbench developers a smooth experience on local and remote machines. NVIDIA AI Workbench is an easy-to-use toolkit that allows developers to create, test, and customize AI and machine learning models on their PC or workstation and scale them to the data center or public cloud. It simplifies interactive development workflows while automating technical tasks that halt beginners and derail experts. AI Workbench makes workstation setup and configuration fast and easy. Example projects are also included to help developers get started even faster with their own data and use cases. Docker engineering teams are collaborating with NVIDIA to improve the user experience with NVIDIA GPU-accelerated platforms through recent improvements to the AI Workbench installation on WSL2. Check out how NVIDIA AI Workbench can be used locally to tune a generative image model to produce more accurate prompted results: In a near-term update, AI Workbench will use the Container Device Interface (CDI) to govern local and remote GPU-enabled environments. CDI is a CNCF-sponsored project led by NVIDIA and Intel, which exposes NVIDIA GPUs inside of containers to support complex device configurations and CUDA compatibility checks. This simplifies how research, simulation, GenAI, and ML applications utilize local and cloud-native GPU resources. With Docker Desktop 4.29 (which includes Moby 25), developers can configure CDI support in the daemon and then easily make all NVIDIA GPUs available in a running container by using the –device option via support for CDI devices. docker run --device nvidia.com/gpu=all <image> <command> LLM-powered apps with Docker GenAI Stack The Docker GenAI Stack lets teams easily integrate NVIDIA accelerated computing into their AI workflows. This stack, designed for seamless component integration, can be set up on a developer’s laptop using Docker Desktop for Windows. It helps deliver the power of NVIDIA GPUs and NVIDIA NIM to accelerate LLM inference, providing tangible improvements in application performance. Developers can experiment and modify five pre-packaged applications to leverage the stack’s capabilities. Accelerate AI/ML development with Docker Desktop Docker Desktop facilitates an accelerated machine learning development environment on a developer’s laptop. By tapping NVIDIA GPU support for containers, developers can leverage tools distributed via Docker Hub, such as PyTorch and TensorFlow, to see significant speed improvements in their projects, underscoring the efficiency gains possible with NVIDIA technology on Docker. Securing the software supply chain Docker Hub’s registry and tools, including capabilities for build, digital signing, Software Bill of Materials (SBOM), and vulnerability assessment via Docker Scout, allow customers to ensure the quality and integrity of container images from end to end. This comprehensive approach not only accelerates the development of machine learning applications but also secures the GenAI and LLM software supply chain, providing developers with the confidence that their applications are built on a secure and efficient foundation. “With exploding interest in AI from a huge range of developers, we are excited to work with NVIDIA to build tooling that helps accelerate building AI applications. The ecosystem around Docker and NVIDIA has been building strong foundations for many years and this is enabling a new community of enterprise AI/ML developers to explore and build GPU accelerated applications.” Justin Cormack, Chief Technology Officer, Docker “Enterprise applications like NVIDIA AI Workbench can benefit enormously from the streamlining that Docker Desktop provides on local systems. Our work with the Docker team will help improve the AI Workbench user experience for managing GPUs on Windows.” Tyler Whitehouse, Principal Product Manager, NVIDIA Learn more By leveraging Docker Desktop and Docker Hub with NVIDIA technologies, developers are equipped to harness the revolutionary power of AI, grow their skills, and seize opportunities to deliver innovative applications that push the boundaries of what’s possible. Check out NVIDIA’s Docker Hub library and NVIDIA AI Enterprise to get started with your own AI solutions. View the full article
  12. We are racing toward the finish line at KubeCon + CloudNativeCon Europe, March 19 – 22, 2024 in Paris, France. Join the Docker “pit crew” at Booth #J3 for an incredible racing experience, new product demos, and limited-edition SWAG. Meet us at our KubeCon booth, sessions, and events to learn about the latest trends in AI productivity and best practices in cloud-native development with Docker. At our KubeCon booth (#J3), we’ll show you how building in the cloud accelerates development and simplifies multi-platform builds with a side-by-side demo of Docker Build Cloud. Learn how Docker and Test Containers Cloud provide a seamless integration within the testing framework to improve the quality and speed of application delivery. It’s not all work, though — join us at the booth for our Megennis Motorsport Racing experience and try to beat the best! Take advantage of this opportunity to connect with the Docker team, learn from the experts, and contribute to the ever-evolving cloud-native landscape. Let’s shape the future of cloud-native technologies together at KubeCon! Deep dive sessions from Docker experts Is Your Image Really Distroless? — Docker software engineer Laurent Goderre will dive into the world of “distroless” Docker images on Wednesday, March 20. In this session, Goderre will explain the significance of separating build-time and run-time dependencies to enhance container security and reduce vulnerabilities. He’ll also explore strategies for configuring runtime environments without compromising security or functionality. Don’t miss this must-attend session for KubeCon attendees keen on fortifying their Docker containers. Simplified Inner and Outer Cloud Native Developer Loops — Docker Staff Community Relations Manager Oleg Šelajev and Diagrid Customer Success Engineer Alice Gibbons tackle the challenges of developer productivity in cloud-native development. On Wednesday, March 20, they will present tools and practices to bridge the gap between development and production environments, demonstrating how a unified approach can streamline workflows and boost efficiency across the board. Engage, learn, and network at these events Security Soiree: Hands-on cloud-native security workshop and party — Join Sysdig, Snyk, and Docker on March 19 for cocktails, team photos, music, prizes, and more at the Security Soiree. Listen to a compelling panel discussion led by industry experts, including Docker’s Director of Security, Risk & Trust, Rachel Taylor, followed by an evening of networking and festivities. Get tickets to secure your invitation. Docker Meetup at KubeCon: Development & data productivity in the age of AI — Join us at our meetup during KubeCon on March 21 and hear insights from Docker, Pulumi, Tailscale, and New Relic. This networking mixer at Tonton Becton Restaurant promises candid discussions on enhancing developer productivity with the latest AI and data technologies. Reserve your spot now for an evening of casual conversation, drinks, and delicious appetizers. See you March 19 – 22 at KubeCon + CloudNativeCon Europe We look forward to seeing you in Paris — safe travels and prepare for an unforgettable experience! Learn more New to Docker? Create an account. Learn about Docker Build Cloud. Subscribe to the Docker Newsletter. Read about what rolled out in Docker Desktop 4.27, including synchronized file shares, Docker Init GA, a private marketplace for extensions, Moby 25, support for Testcontainers with ECI, Docker Build Cloud, and Docker Debug Beta. View the full article
  13. Security auditing and pen testing are essential components of any organization for vulnerability checks and security and network attacks. In this regard, Kali Linux is a well-liked globally used pen testing and security forensic operating system that offers over 600 penetration testing applications and packages. It can be easily run on all major operating systems or can be run as an independent system. To run and use Kali Linux on the system without affecting the host system, users can use system virtualization. The Kali Linux can be run in Docker containers as well as in virtual machines. While running Kali in a virtual machine, it will install and run separate Kali’s OS and kernel and will take more space. In Docker, the Kali Linux can be run inside the small executable package named containers. These docker containers use OS virtualization and system kernel to operate Kali Linux. Running Kali Linux in Docker is one of the effective and efficient choices. In this blog, we will demonstrate: Prerequisite: Install Docker on the System How to Run Kali Linux in Docker Bonus Tip: How to Mount Volume With Kali Linux Container How to Remove KaIi’s Container? Conclusion Prerequisite: Install Docker on the System To run the Kali Linux in a Docker container, the user needs to install Docker first on the system. Docker is a well-liked universally used containerization platform that permits us to build, deploy, and ship the application and software in isolated habitats. Install Docker on Windows: On Windows, Docker and its components can be easily installed by installing its Desktop application. To install Docker on Windows, first, enable the WSL and virtual platform features. Then, download and install Docker Desktop from the official website. For proper guidance to install Docker, follow the “Install Docker Desktop” article. Install Docker on Linux: On Linux, Docker can be installed from the official source repository of the installed Linux distribution. To install Docker on Debian or Ubuntu, go through the “Install Docker on Debian 12” or “Install Docker on Ubuntu” article respectively. Install Docker on MacOS: On MacOS, the Docker installer can be downloaded from the Docker official website. Then, users can install Docker by following our linked article “Install Docker Desktop on Mac”. However, the working and commands of Docker will remain the same on any operating system. For demonstration to run Kali Linux in Docker, we will use Windows OS. How to Run Kali Linux in Docker? To run Kali Linux in Docker, Docker releases the official “kali-rolling” image to install and use Docker inside the container. The image in Docker is a template or simple instructions that guide how to build the container. To install and use Kali in a container, follow the given demonstration. Step 1: Pull Kali’s Official Image First, pull the Docker image from the official website. To pull the image, the user needs to log in to Docker Hub’s official Docker registry. docker pull kalilinux/kali-rolling For confirmation, list down the Docker images: docker images Here, we have downloaded the “kali-rolling” Kali’s image from Docker Hub: Step 2: Run Kali in Container Now, run the Kali Linux inside the container through the “docker run –name <cont-name> kalilinux/kali-rolling” command: docker run --name kali-cont -it kalilinux/kali-rolling In the given command, the “–name” will set the container name, and “-it” is used to open and run the TTY pseudo terminal interactively: Here, you can see that Kali’s root terminal is open on the screen. Step 3: Update Kali Now, update the Kali repository through “apt update”: apt update Here, the “8” package needs to be upgraded: Step 4: Upgrade Kali’s Packages To upgrade the packages in Kali, execute the “apt upgrade” command. Here, the “-y” option will permit the process to use an additional container space: apt upgrade -y Step 5: Install Essential Packages To install essential packages in Kali Linux, execute the “apt install <package-name>” command: apt install nikto curl nmap nano git -y Here, we have installed “nikto”, “curl”, “nmap”, and “git” in the Kali Linux container: Bonus Tip: Add a New User in the Kali Linux Container Sometimes, the user wants to create an unprivileged account to secure Kali’s root account. This is also a recommended option for Kali’s container security. The user account will be used as the root account but always stands lower than the root. To add a Kali user in a container, utilize the “adduser <user-name>” command: adduser kaliuser Now, add the new user to the sudo user group. For this purpose, run the below command: usermod -aG sudo kaliuser In order to exit Kali’s terminal in the Docker container, simply run the “exit” command: exit That is how a user can run Kali Linux in the Docker container. Bonus Tip: How to Mount Volume With Kali Linux Container? The volume is utilized to persist the container’s data outside the container. This is mostly used for backup purposes. The mounting volume also means a shared drive that can be accessible to both the Docker container and the host system. To mount the volume in Kali’s container, follow the below steps. Step 1: List Down Docker Containers List down the containers in Docker using the “docker ps” command. Here, to view all stopped and running containers, we have added the “-a” flag: docker ps -a Note the ID of the Kali container from the displayed result: Step 2: Save the Kali’s Container in New Image Next, make a copy of Kali’s container in a new Docker image using the “docker commit <cont-id> <new-image-name>” command: docker commit 16de59fc563d updated-kali-image This image copy will be used to run the new Kali container and mount the volume. We have created the image from the container, so that, we can preserve the previous state and data of Kali’s Docker container: For verification, view the docker images using the below command: docker images Here, you can see we have generated the new Docker image from the Kali container: Step 3: Run and Mount the Volume with Kali Container Now, run the generated Docker image to execute the new Kali container and also mount the volume with the container using the “-v” option: docker run -it --name new-kalicont -v C:/Users/Dell/Documents/kali:/root/kali updated-kali-image In the above command, we have mounted the host directory “C:/Users/Dell/Documents/kali” to the containers directory “/root/kali”: Step 4: Open the Mounted Volume Directory Now, navigate to the container’s directory where the volume is mounted using “cd”: cd /root/kali Step 5: Create a File Now generate a new file and add some content in the file through “echo” command. This step is used for verification purposes: echo "Kali Docker Container" >> text.txt To view the content of the file, run the “cat <file-name>” command: cat text.txt Now, let’s see if this file is shared and accessible on the host machine or not. Step 6: Verification For confirmation, exit the Docker container terminal using the “exit” command. Then, navigate to the mounted directory using “cd”: cd C:/Users/Dell/Documents/kali To check the file and folders of the opened directory, run the “ls” command: ls Here, you can see the file “text.txt” that is created in Kali’s container is also visible in the mounted directory. This means we have effectively mounted volume with Kali Linux container: View the content of the file using the “cat” command”: cat text.txt This is how we can embed volume with a Docker container and preserve the container’s data. How to Remove KaIi’s Container? To remove Kali Linux running in a Docker container, users can remove it by deleting the container. To remove or delete the container, first, stop the running container then, run the “docker rm” command. For demonstration, go through the following steps. Step 1: Stop Docker Container First, stop the executing container using the “docker stop <cont-name/cont-id>” command: docker stop new-kalicont Step 2: Remove the Container Then, delete the Kali Linux container using the “docker rm <cont-name/cont-id>” command: docker rm new-kalicont We have the method to install and use Kali Linux in a Docker container. Conclusion To run the Kali Linux in Docker, first, download the image from the Docker Hub. After that, run the image to set up the Kali Linux in the Docker container through the “docker run -it kalilinux/kali-rolling” command. Users can also mount external volume to Docker containers through the “-v” option. This post has explained how to execute Kali Linux in Docker. View the full article
  14. After installing Docker, a daemon is created to manage it on the host operating system. The docker daemon, commonly referred to as dockerd, is responsible for managing docker images, containers, and other services. As with other services, the systemctl can also be used to manage the dockerd service. The systemctl is a command line utility used to manage the systemd services in Linux distributions that come with systemd init system. In this guide, I will demonstrate the process of managing the Docker service on Linux by utilizing the systemctl tool. How to Start Docker Service Using systemctl Command By default, on Linux, the docker service initiates upon boot. However, in many cases, you may want to manage it manually, such as troubleshooting it, or in case of abnormality. Docker service and socket files can easily be managed with the systemctl. The command to start the docker service is given below: sudo systemctl start docker.service The above command will not produce any indication that the service has begun functioning. To determine if the Docker service is active and running, use the status option with the systemctl command and the service name. sudo systemctl status docker Note that, in the above commands, the .service extension is optional and can be skipped. How to Manage Docker Boot Settings using systemctl Command As mentioned earlier, in all modern Linux distributions, the docker service starts automatically on boot. But in order to manage it manually, the systemctl command can be employed. For example, if you want to reduce boot time and save resources by not starting the docker service on boot, simply disable it. sudo systemctl disable docker Disabling the docker service will not immediately stop it; the service will remain active until explicitly stopped. The service will remain active, however, the target file that keeps the service enabled on boot will be removed and on the next boot the service will be disabled. To start the docker, simply use the systemctl start with the service name, and to stop it, use the systemctl stop commands. sudo systemctl stop docker And to start it on boot, enable the service. sudo systemctl enable docker Enabling the service will again create a symbolic link in the /wants directory. How to Start Docker Service Manually If you do not want to use the systemctl command line utility to start the docker service, then it can be manually triggered using the dockerd command with sudo privileges. sudo dockerd To stop the service, press ctrl+c keys. Conclusion The system administration tool is also capable of handling the Docker service on Linux. By default, the docker service is enabled on boot, however, it can also be managed manually using the systemctl command. To start an inactive docker service the systemctl start docker command is used and to disable it to load on boot the systemctl disable docker command is used. View the full article
  15. Docker is a well-liked dockerization platform that dockerizes applications and software in an isolated environment known as a container. While executing applications in containers, users are usually required to access the containerized application outside the container. For this purpose, users need to apply the port forwarding technique. Port forwarding in Docker is a process that enables us to expose the container port on the Docker host system. It permits us to run the application in an isolated environment and also make it accessible from outside the container on a user machine. This post will demonstrate: How to Forward Port in Docker Using “-p” or “–publish” Tag How to Forward Port in Docker Using Docker Compose How to Forward Port to a Specific Network Conclusion How to Forward Port in Docker Using “-p” or “–publish” Tag To forward a port in Docker, the user needs to publish the container port on the Docker host. For this purpose, run the container on the docker host using the “-p” or “–publish” tag in the “docker run” command. For proper demonstration, follow the below instructions. Step 1: Make Dockerfile First, create a file and set its name as “Dockerfile”. This file contains the textual instructions to create the template of the Docker container. For instance, let’s dockerize the simple Golang program using the below snippet in the Dockerfile: FROM golang:1.8 WORKDIR /go/src/app COPY main2.go . RUN go build -o webserver . EXPOSE 8080 CMD ["./webserver"] In the given snippet: “FROM” command is utilized to set the base image. “WORKDIR” defines the container’s working directory. “COPY” command will create a copy of the program file in the container-specified path. “RUN” command will execute the specified command in the container. “EXPOSE” command specifies the port where the container will be listened to. “CMD” specifies the executable points of the container. Step 2: Create a Golang Program Next, create another file named “main2.go” and paste the below provided Golang program that prints a simple string on port “8080”: package main import ( "fmt" "log" "net/http" ) func handler(w http.ResponseWriter, r *http.Request) { html := ` <!DOCTYPE html> <html> <head> <title>Hello Golang!</title> <style> body { background-color: #D2B48C; } .container { text-align: center; padding: 50px; } </style> </head> <body> <div class="container"> <h1>Hello! Welcome to LinuxHint Tutorial</h1> </div> </body> </html>` w.Header().Set("Content-Type", "text/html") fmt.Fprint(w, html) } func main() { http.HandleFunc("/", handler) log.Fatal(http.ListenAndServe("0.0.0.0:8080", nil)) } Step 3: Generate Container’s Snapshot Now, generate a snapshot for the container from the above specified Dockerfile. For this purpose, first, navigate to the directory where the Dockerfile is created using the “cd <path-to-working-dir>” command: Next, generate the new container image using the given command: docker build -t golang-img . The given result shows that the image is created according to the provided build context. Step 4: Forward Host Port to Container Port Now, execute the container and forward the host port to the container port to access the dockerize app outside the container on a local machine. For this purpose, run the “docker run –name <cont-name> -p <host-port>:<cont-port> <image-name>” command: docker run --name go-cont -p 8080:8080 golang-img Here, the “-p” flag is utilized to publish the container executing port on the docker host: Step 5: Verification For verification, view the running containers using “docker ps”: docker ps In the below result, the containerized application is listening on available network interfaces “0.0.0.0” on published port “8080”. It implies that the port is forwarded to the running application on the host machine: Now, launch the browser and navigate to “http://localhost:8080/” and verify whether the port forwarding technique is applied or not and if the containerized application is accessible outside the container on the host system: Here we have successfully forwarded the host port to the container port and the application is accessible on the docker host. How to Forward Port in Docker Using Docker Compose To forward the container port to the host to access the containerized application from outside the container, the user can utilize the “port” key in the compose yaml file. Docker compose is a Docker service that enables us to run different services and applications in different containers. Using the “docker-compose.yml” file, the user can also forward the container port to the host machine and have an application connection to the outside world. Check out the below procedure for illustrations. Step 1: Make Compose File First, generate a file named “docker-compose.yml” file and add the following content to the file: version: "3" services: web: build: . ports: - 8080:8080 In the above snippet, the “ports” key is used to connect the host to the container port. Here, the first value is the host port, and the second value is the container port. Step 2: Launch App After specifying the instructions in the compose file, launch the application in a container using the “docker-compose up” command: docker-compose up Step 3: Verification For verification, list down the compose containers using the “docker-compose ps” command: docker-compose ps -a To check if the container is accessible on the host machine, navigate to the “http://localhost:8080/” URL. Here, you can see we have effectively forwarded the container port on the host: How to Forward Port to Specific Network To forward a container port to a specific network, the user needs to specify the network on which they want to access the container using the “–network” option. Look at the given steps for demonstration. Step 1: Create a Network Create a new network using the “docker network create <net-name>” command. By default, this network will use a bridge driver: docker network create mygo-network To view the Docker networks, utilize the “docker network ls” command: docker network ls Here, we have successfully created “mygo-network” that is using “bridge” driver: Step 2: Map Network To run and access the container on the specific network using the port forwarding technique, use the below command: docker run -p 8080:8080 --network mygo-network golang-img In the given command, the “-p” option publishes the container on a specified port of the specified network. Here, the “–network” option is used to define the docker network: For verification, again navigate to the “http://localhost:8080” port and check if the container is accessible on the specified network or not: The above output indicates that we have effectively forwarded the container port on a specific network. Note: While using the “host” network, the user does not need to publish the port from container to host using the “-p” or “–publish” option. Forward Port to Specific Network Using Docker Compose Add the following snippet in the “docker-compose.yml” file. In the below snippet, the “networks” key is used to specify the network: version: "3" services: web: build: . ports: - 8080:8080 Now, launch the application in a container using the “docker-compose up” command: docker-compose up We have covered the methods for port forwarding in Docker. Conclusion To forward the port in Docker for accessing the dockerize application outside the container, the user can either use the “-p” or “–publish” option in the “docker run” command or the user can use the “ports” key in Docker compose file. In order to access the container on a specific network, the user can forward the container port on a network by using the “–network <network>” option. This blog has demonstrated the techniques to forward ports in Docker. View the full article
  16. Hackers are exploiting misconfigured servers running Docker, Confluence, and other services in order to drop cryptocurrency miners. Researchers at Cado Security Labs recently observed one such malware campaign, noting how threat actors are using multiple “unique and unreported payloads”, including four Golang binaries, to automatically discover Apache Hadoop YARN, Docker, Confluence, and Redis hosts, vulnerable to CVE-2022-26134, an unauthenticated and remote OGNL injection vulnerability that allows for remote code execution. This flaw was first discovered two years ago, when threat actors targeted Confluence servers (typically the confluence user on Linux installations). At the time, the researchers said internet-facing Confluence servers were at “very high risk”, and urged IT teams to apply the patch immediately. It seem that even now, two years later, not all users installed the available fixes. Unidentified threat The tools are also designed to exploit the flaw and drop a cryptocurrency miner, spawn a reverse shell, and enable persistent access to the compromised hosts. Cryptocurrency miners are popular among cybercriminals, as they take advantage of the high compute power of a server to generate almost untraceable profits. One of the most popular crypto-miners out there is called XMRig, a small program mining the Monero currency. On the victim’s side, however, not only are their servers unusable, but the miners would rack up their electricity bill fairly quickly. For now, Cado is unable to attribute the campaign to any specific threat actor, saying it would need the help of law enforcement for that: “As always, it’s worth stressing that without the capabilities of governments or law enforcement agencies, attribution is nearly impossible – particularly where shell script payloads are concerned,” it said. Still, it added that the shell script payloads are similar to ones seen in attacks done by TeamTNT, and WatchDog. More from TechRadar Pro This new Linux malware floods machines with cryptominers and DDoS botsHere's a list of the best firewalls around todayThese are the best endpoint security tools right now View the full article
  17. The post How to Setup Apache Web Server in a Docker Container first appeared on Tecmint: Linux Howtos, Tutorials & Guides .If you are a Linux system administrator who provides support for developers, chances are you’ve heard of Docker. If not, this software solution will make The post How to Setup Apache Web Server in a Docker Container first appeared on Tecmint: Linux Howtos, Tutorials & Guides.View the full article
  18. Docker Swarm is a popular container orchestration technology that makes containerized application administration easier. While Docker Swarm provides strong capabilities for deploying and scaling applications, it’s also critical to monitor and report the performance and health of your Swarm clusters. In this post, we will look at logging and monitoring in a Docker Swarm environment, as well as best practices, tools, and tactics for keeping your cluster working smoothly. The Importance of Logging and Monitoring Before we delve into the technical aspects of logging and monitoring in a Docker Swarm environment, let’s understand why these activities are crucial in a containerized setup. View the full article
  19. By leveraging the wide array of public images available on Docker Hub, developers can accelerate development workflows, enhance productivity, and, ultimately, ship scalable applications that run like clockwork. When building with public content, acknowledging the potential operational risks associated with using that content without proper authentication is crucial. In this post, we will describe best practices for mitigating these risks and ensuring the security and reliability of your containers. Import public content locally There are several advantages to importing public content locally. Doing so improves the availability and reliability of your public content pipeline and protects you from failed CI builds. By importing your public content, you can easily validate, verify, and deploy images to help run your business more reliably. For more information on this best practice, check out the Open Container Initiative’s guide on Consuming Public Content. Configure Artifact Cache to consume public content Another best practice is to configure Artifact Cache to consume public content. Azure Container Registry’s (ACR) Artifact Cache feature allows you to cache your container artifacts in your own Azure Container Registry, even for private networks. This approach limits the impact of rate limits and dramatically increases pull reliability when combined with geo-replicated ACR, allowing you to pull artifacts from the region closest to your Azure resource. Additionally, ACR offers various security features, such as private networks, firewall configuration, service principals, and more, which can help you secure your container workloads. For complete information on using public content with ACR Artifact Cache, refer to the Artifact Cache technical documentation. Authenticate pulls with public registries We recommend authenticating your pull requests to Docker Hub using subscription credentials. Docker Hub offers developers the ability to authenticate when building with public library content. Authenticated users also have access to pull content directly from private repositories. For more information, visit the Docker subscriptions page. Microsoft Artifact Cache also supports authenticating with other public registries, providing an additional layer of security for your container workloads. Following these best practices when using public content from Docker Hub can help mitigate security and reliability risks in your development and operational cycles. By importing public content locally, configuring Artifact Cache, and setting up preferred authentication methods, you can ensure your container workloads are secure and reliable. Learn more about securing containers Try Docker Scout to assess your images for security risks. Looking to get up and running? Use our Quickstart guide. Have questions? The Docker community is here to help. Subscribe to the Docker Newsletter to stay updated with Docker news and announcements. Additional resources for improving container security for Microsoft and Docker customers Visit Microsoft Learn. Read the introduction to Microsoft’s framework for securing containers. Learn how to manage public content with Azure Container Registry. View the full article
  20. Docker Desktop 4.28 introduces updates to file-sharing controls, focusing on security and administrative ease. Responding to feedback from our business users, this update brings refined file-sharing capabilities and path allow-listing, aiming to simplify management and enhance security for IT administrators and users alike. Along with our investments in bringing access to cloud resources within the local Docker Desktop experience with Docker Build Cloud Builds view, this release provides a more efficient and flexible platform for development teams. Introducing enhanced file-sharing controls in Docker Desktop Business As we continue to innovate and elevate the Docker experience for our business customers, we’re thrilled to unveil significant upgrades to the Docker Desktop’s Hardened Desktop feature. Recognizing the importance of administrative control over Docker Desktop settings, we’ve listened to your feedback and are introducing enhancements prioritizing security and ease of use. For IT administrators and non-admin users, Docker now offers the much-requested capability to specify and manage file-sharing options directly via Settings Management (Figure 1). This includes: Selective file sharing: Choose your preferred file-sharing implementation directly from Settings > General, where you can choose between VirtioFS, gRPC FUSE, or osxfs. VirtioFS is only available for macOS versions 12.5 and above and is turned on by default. Path allow-listing: Precisely control which paths users can share files from, enhancing security and compliance across your organization. Figure 1: Display of Docker Desktop settings enhanced file-sharing settings. We’ve also reimagined the Settings > Resources > File Sharing interface to enhance your interaction with Docker Desktop (Figure 2). You’ll notice: Clearer error messaging: Quickly understand and rectify issues with enhanced error messages. Intuitive action buttons: Experience a smoother workflow with redesigned action buttons, making your Docker Desktop interactions as straightforward as possible. Figure 2: Displaying settings management in Docker Desktop to notify business subscribers of their access rights. These enhancements are not just about improving current functionalities; they’re about unlocking new possibilities for your Docker experience. From increased security controls to a more navigable interface, every update is designed with your efficiency in mind. Refining development with Docker Desktop’s Builds view update Docker Desktop’s previous update introduced Docker Build Cloud integration, aimed at reducing build times and improving build management. In this release, we’re landing incremental updates that refine the Builds view, making it easier and faster to manage your builds. New in Docker Desktop 4.28: Dedicated tabs: Separates active from completed builds for better organization (Figure 3). Build insights: Displays build duration and cache steps, offering more clarity on the build process. Reliability fixes: Resolves issues with updates for a more consistent experience. UI improvements: Updates the empty state view for a clearer dashboard experience (Figure 4). These updates are designed to streamline the build management process within Docker Desktop, leveraging Docker Build Cloud for more efficient builds. Figure 3: Dedicated tabs for Build history vs. Active builds to allow more space for inspecting your builds. Figure 4: Updated view supporting empty state — no Active builds. To explore how Docker Desktop and Docker Build Cloud can optimize your development workflow, read our Docker Build Cloud blog post. Experience the latest Builds view update to further enrich your local, hybrid, and cloud-native development journey. These Docker Desktop updates support improved platform security and a better user experience. By introducing more detailed file-sharing controls, we aim to provide developers with a more straightforward administration experience and secure environment. As we move forward, we remain dedicated to refining Docker Desktop to meet the evolving needs of our users and organizations, enhancing their development workflows and agility to innovate. Join the conversation and make your mark Dive into the dialogue and contribute to the evolution of Docker Desktop. Use our feedback form to share your thoughts and let us know how to improve the Hardened Desktop features. Your input directly influences the development roadmap, ensuring Docker Desktop meets and exceeds our community and customers’ needs. Learn more Authenticate and update to receive the newest Docker Desktop features per your subscription level. New to Docker? Create an account. Read our latest blog on synchronized file shares. Read about what rolled out in Docker Desktop 4.27, including synchronized file shares, Docker Init GA, a private marketplace for extensions, Moby 25, support for Testcontainers with ECI, Docker Build Cloud, and Docker Debug Beta. Learn about Docker Build Cloud. Subscribe to the Docker Newsletter. View the full article
  21. Developing and running secure Docker applications demands a strategic approach, encompassing considerations like avoiding unnecessary bloat in images and access methods. One crucial aspect to master in Docker development is understanding image layering and optimization. Docker images are constructed using layers, each representing specific changes or instructions in the image’s build process. In this article, we’ll delve into the significance of Docker image layering, the importance of choosing minimal base images, and practical approaches like multi-stage builds. Additionally, we’ll discuss the critical practices of running applications as non-root users, checking images for vulnerabilities using tools like Docker Scout, and implementing Docker Content Trust for image integrity. This comprehensive guide aims to equip developers and operators with actionable insights to enhance the security and efficiency of Docker applications. Understanding Docker image layering Before we jump into Docker security aspects, we need to understand Docker image layering and optimization. For a better understanding, let’s consider this Dockerfile, retrieved from a sample repository. It’s a simple React program that prints “Hello World.” The core code uses React, a JavaScript library for building user interfaces. Docker images comprise layers, and each layer represents a set of file changes or instructions in the image’s construction. These layers are stacked on each other to form the complete image (Figure 1). To combine them, a “unioned filesystem” is created, which basically takes all of the layers of the image and overlays them together. These layers are immutable. When you’re building an image, you’re simply creating new filesystem diffs, not modifying previous layers. Figure 1: Visual representation of layers in a Docker image. When you build a Docker image, each instruction in your Dockerfile creates a new layer. Layers are cached, so if you make a change in your code and rebuild the image, only the layers affected by that change will be recreated, saving time and bandwidth. This layering system makes images efficient to use. You might notice that there are two COPY instructions (as shown in Figure 1). The first COPY instruction copies only package.json (and potentially package-lock.json) into the image. The second COPY instruction copies the remaining application code (excluding files already copied in the first COPY command). If only application code changes, the first two layers are cached, avoiding re-downloading and reinstalling dependencies, which can significantly speed up builds. 1. Choose a minimal base image Docker Hub has millions of images, and choosing the right image for your application is important. It is always better to consider a minimal base image with a small size, as slimmer images contain fewer dependencies, resulting in less surface area to attract. Not only does a smaller image improve your image security, but it also reduces the time for pulling and pushing images and optimizing the overall development lifecycle. Figure 2: Example of Docker images with different sizes. As depicted in Figure 2, we opted for the node:21.6-alpine3.18 image due to its smaller footprint. We selected the Alpine image for our Node application below because it omits additional tools and packages present in the default Node image. This decision aligns with good security practices, as it minimizes the attack surface by eliminating unnecessary components for running your application. # Use the official Node.js image with Alpine Linux as the base image FROM node:21.6-alpine3.18 # Set the working directory inside the container to /app WORKDIR /app # Copy package.json and package-lock.json to the working directory COPY package*.json ./ # Install Node.js dependencies based on the package.json RUN npm install # Copy all files from the current directory to the working directory in the container COPY . . # Expose port 3000 EXPOSE 3000 # Define the command to run your application when the container starts CMD ["npm", "start"] 2. Use multi-stage builds Multi-stage builds offer a great way to streamline Docker images, making them smaller and more secure. They allow us to trim down a hefty 1.9 GB image to a lean 140 MB by using different build stages. In this approach, we leverage multiple FROM statements and carefully pick only the necessary pieces from one stage to another. We have converted our Dockerfile to a multi-stage one (Figure 3). In the first stage, we use a Node.js image to build the app, manage dependencies, and create application files (see the Dockerfile below). In the second stage, we copy the lightweight files generated in the first step and use Nginx to run them. We skip the build tool required to build the app in the final stage. This is why the final image is small and suitable for the production environment. Also, this is a great representation that we don’t need the heavyweight system on which we build; we can copy them to a lighter runner to run the app. Figure 3: High-level representation of Docker multi-stage build. # Stage 1: Build the application FROM node:21.6-alpine3.18 AS builder # Set the working directory for the build stage WORKDIR /app # Copy package.json and package-lock.json COPY package*.json ./ # Install dependencies RUN npm install # Copy the application source code into the container COPY . . # Build the application RUN npm run build # Stage 2: Create the final image FROM nginx:1.20 # Set the working directory within the container WORKDIR /app # Copy the built application files from the builder stage to the nginx html directory COPY --from=builder /app/build /usr/share/nginx/html # Expose port 80 for the web server EXPOSE 80 # Start nginx in the foreground CMD ["nginx", "-g", "daemon off;"] You can access this Dockerfile directly from a repository on GitHub. 3. Check your images for vulnerabilities using Docker Scout Let’s look at the following multi-stage Dockerfile: # Stage 1: Build the application FROM node:21.6-alpine3.18 AS builder # Set the working directory for the build stage WORKDIR /app # Copy package.json and package-lock.json COPY package*.json ./ # Install dependencies RUN npm install # Copy the application source code into the container COPY . . # Build the application RUN npm run build # Stage 2: Create the final image FROM nginx:1.20 # Set the working directory within the container WORKDIR /app # Copy the built application files from the builder stage to the nginx html directory COPY --from=builder /app/build /usr/share/nginx/html # Expose port 80 for the web server EXPOSE 80 # Start nginx in the foreground CMD ["nginx", "-g", "daemon off;"] You can run the following command to build a Docker image: docker build -t react-app-multi-stage . -f Dockerfile.multi Once the build process is complete, the CLI lets you view a summary of image vulnerabilities and recommendations. That’s what Docker Scout is all about. => exporting to image 0.0s => => exporting layers 0.0s => => writing image sha256:f348bcb19411fa1c4abf2e682f3dded7963c0c0c9b39c31804df5cd0e0f185d9 0.0s => => naming to docker.io/library/react-node-app 0.0s View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/sci2bo7xihgwnfihigd8x9uh1 What's Next? View a summary of image vulnerabilities and recommendations → docker scout quickview Docker Scout analyzes the contents of container images and generates a report of packages and vulnerabilities that it detects, helping users to identify and remediate issues. Docker Scout image analysis is more than point-in-time scanning; the analysis gets reevaluated continuously, meaning you don’t need to re-scan the image to see an updated vulnerability report. If your base image has a security concern, Docker Scout will check for updates and patches to suggest how to replace the image. If issues exist in other layers, Docker Scout will reveal precisely where it was introduced and make recommendations accordingly (Figure 4). Figure 4: How Docker Scout works. Docker Scout uses Software Bills of Materials (SBOMs) to cross-reference with streaming Common Vulnerabilities and Exposures (CVE) data to surface vulnerabilities (and potential remediation recommendations) as soon as possible. An SBOM is a nested inventory, a list of ingredients that make up software components. Docker Scout is built on a streaming event-driven data model, providing actionable CVE reports. Once the SBOM is generated and exists, Docker Scout automatically checks between existing SBOMs and new CVEs. You will see automatic updates for new CVEs without re-scanning artifacts. After building the image, we will open Docker Desktop (ensure you have the latest version installed), analyze the level of vulnerabilities, and fix them. We can also use Docker Scout from the Docker CLI, but Docker Desktop gives you a better way to visualize the stuff. Select Docker Scout from the sidebar and choose the image. Here, we have chosen the react-app-multi-stage, which we built just now. As you can see, Scout immediately shows vulnerabilities and their level. We can select View packages and CVEs beside that to take a deep look and get recommendations (Figure 5). Figure 5: Docker Scout tab in Docker Desktop. Now, a window will open, which shows you a detailed report about the vulnerabilities and layer-wise breakdown (Figure 6). Figure 6: Detailed report of vulnerabilities. To get recommendations to fix the image vulnerabilities, select Recommended Fixes in the top-right corner, and a dialog box will open with the recommended fixes. As shown in Figure 7, it recommends upgrading Nginx from version 1.20 to 1.24, which has fewer vulnerabilities and fixes all the critical and higher-level issues. Also, a good thing to note is that even though version 1.25 was available, it still recommends version 1.24 because 1.25 has critical vulnerabilities compared to 1.24. Figure 7: Recommendation tab for fixing vulnerabilities in Docker Desktop. Now, we need to rebuild our image by changing the base image of the final stage to the recommended version 1.24 (Figure 8), which will fix those vulnerabilities. Figure 8: Advanced image analysis with Docker Scout. The key features and capabilities of Docker Scout include: Unified view: Docker Scout provides a single view of your application’s dependencies from all layers, allowing you to easily understand your image composition and identify remediation steps. Event-driven vulnerability updates: Docker Scout uses an event-driven data model to continuously detect and surface vulnerabilities, ensuring that analysis is always up-to-date and based on the latest CVEs. In-context remediation recommendations: Docker Scout provides integrated recommendations visible in Docker Desktop, suggesting remediation options for base image updates and dependency updates within your application code layers. Note that Docker Scout is available through multiple interfaces, including the Docker Desktop and Docker Hub user interfaces, as well as a web-based user interface and a command-line interface (CLI) plugin. Users can view and interact with Docker Scout through these interfaces to gain a deeper understanding of the composition and security of their container images. 4. Use Docker Content Trust Docker Content Trust (DCT) lets you sign and verify Docker images, ensuring they come from trusted sources and haven’t been tampered with. This process acts like a digital seal of approval for images, whether signed by people or automated processes. To enable Docker Content Trust, follow these steps: Initialize Docker Content Trust Before you can sign images, ensure that Docker Content Trust is initialized. Open a terminal and run the following command: export DOCKER_CONTENT_TRUST=1 Sign the Docker image Sign the Docker image using the following command: docker build -t <your_namespace>/node-app docker trust sign <your_namespace>/node-app ... v1.0: digest: sha256:5fa48a9b4e52a9d9681a5786b4885be080668d06019e91eece6dfded5a0f8a47 size: 1986 Signing and pushing trust metadata Enter passphrase for <namespace> key with ID 96c9857: Successfully signed docker.io/<your_namespace/node-app:v1.0 Push the signed image to a registry You can push the signed Docker image to a registry with: docker push <your_namespace/node-app:v1.0 Verify the signature To verify the signature of an image, use the following command: docker trust inspect --pretty <your_namespace>/node-app:v1.0 Signatures for your_namespace/node-app:v1.0 SIGNED TAG DIGEST SIGNERS v1.0 5fa48a9b4e52a9d968XXXXXX19e91eece6dfded5a0f8a47 <your_namespace> List of signers and their keys for <your_namespace>/node-app:v1.0 SIGNER KEYS ajeetraina 96c985786950 Administrative keys for <your_namespace>/node-app:v1.0 Repository Key: 47214511f851e28018a7b0443XXXXXXc7d5846bf6f7 Root Key: 52bae142a9ac98a473c5275bXXXXXX2f4f5068081d567903dd By following these steps, you’ve enabled Docker Content Trust for your Node.js application, signing and verifying the image to enhance security and ensure the integrity of your containerized application throughout its lifecycle. 5. Practice least privileges Security is crucial in containerized environments. Embracing the principle of least privilege ensures that Docker containers operate with only the necessary permissions, thereby reducing the attack surface and mitigating potential security risks. Let’s explore specific best practices for achieving least privilege in Docker. Run as non-root user We minimize potential risks by running applications without unnecessary high-level access (root privileges). Many applications don’t need root privileges. So, in the Dockerfile, we can create a non-root system user to run the application inside the container with the limited privileges of the non-root user, improving security and holding to the principle of least privilege. # Stage 1: Build the application FROM node:21.6-alpine3.18 AS builder # Set the working directory for the build stage WORKDIR /app # Copy package.json and package-lock.json COPY package*.json ./ # Install dependencies RUN npm install # Copy the application source code into the container COPY . . # Build the application RUN npm run build # Stage 2: Create the final image FROM nginx:1.20 # Set the working directory within the container WORKDIR /app # Set ownership and permissions for nginx user RUN chown -R nginx:nginx /app && \ chmod -R 755 /app && \ chown -R nginx:nginx /var/cache/nginx && \ chown -R nginx:nginx /var/log/nginx && \ chown -R nginx:nginx /etc/nginx/conf.d # Create nginx user and set appropriate permissions RUN touch /var/run/nginx.pid && \ chown -R nginx:nginx /var/run/nginx.pid # Switch to the nginx user USER nginx # Copy the built application files from the builder stage to the nginx html directory COPY --from=builder /app/build /usr/share/nginx/html # Expose port 80 for the web server EXPOSE 80 # CMD to start nginx in the foreground CMD ["nginx", "-g", "daemon off;"] If we are using Node as the final base image (Figure 9), we can add USER node to our Dockerfile to run the application as a non-root user. The node user is created within the Node image with restricted permissions, unlike the root user, which has full control over the system. By default, the Docker Node image includes a non-root node user that you can use to avoid running your application container as root. Figure 9: Images tab in Docker Desktop. Limit capabilities Limiting Linux kernel capabilities is crucial for controlling the privileges available to containers. Docker, by default, runs with a restricted set of capabilities. You can enhance security by dropping unnecessary capabilities and adding only the ones required. docker run --cap-drop all --cap-add CHOWN node-app Let’s take our simple Hello World React containerized app and see how it can fit into the example practices for least privilege in Docker and integrate this application with least privilege practices: FROM node:21.6-alpine3.18 WORKDIR /app COPY package*.json ./ RUN npm install COPY . . EXPOSE 3000 # Drop unnecessary capabilities CMD ["--cap-drop", "all", "--cap-add", "CHOWN", "npm", "start"] Add –no-new-privileges flag Running containers with the --security-opt=no-new-privileges flag is essential to prevent privilege escalation through setuid or setgid binaries. The setuid and setgid binaries allow users to run an executable with the file system permissions of the executable’s owner or group, respectively, and to change behavior in directories. This flag ensures that the container’s privileges cannot be escalated during runtime. docker run --security-opt=no-new-privileges node-app Disable inter-container communication Inter-container communication (icc) is enabled by default in Docker, allowing containers to communicate using the docker0 bridged network. docker0 bridges your container’s network (or any Compose networks) to the host’s main network interface, meaning your containers can access the network and you can access the containers. Disabling icc enhances security, requiring explicit communication definitions with --link options. docker run --icc=false node-app Use Linux Security Modules When you’re running applications in Docker containers, you want to make sure they’re as secure as possible. One way to do this is by using Linux Security Modules (LSMs), such as seccomp, AppArmor, or SELinux. These tools can provide additional layers of protection for Linux systems and containerized applications by controlling which actions a container can perform on the host system: Seccomp is a Linux kernel feature that allows a process to make a one-way transition into a “secure” state where it’s restricted to a reduced set of system calls. It restricts the system calls that a process can make, reducing its attack surface and potential impact if compromised. AppArmor confines individual programs to predefined rules, specifying their allowed behavior and limiting access to files and resources. SELinux enforces mandatory access control policies, defining rules for interactions between processes and system resources to mitigate the risk of privilege escalation and enforce least privilege principles. By leveraging these LSMs, administrators can enhance the security posture of their systems and applications, safeguarding against various threats and vulnerabilities. For instance, when considering a simple Hello World React application containerized within Docker, you may opt to employ the default seccomp profile unless overridden with the --security-opt option. This flexibility enables administrators to explicitly define security policies based on their specific requirements, as demonstrated in the following command: docker run --rm -it --security-opt seccomp=/path/to/seccomp/profile.json node-app Customize seccomp profiles Customizing seccomp profiles at runtime offers several benefits: Flexibility: By separating the seccomp configuration from the Dockerfile, you can adjust the security settings without modifying the image itself. This approach allows for easier experimentation and iteration. Granular control: Custom seccomp profiles let you precisely define which system calls are permitted or denied within your containers. This level of granularity allows you to tailor the security settings to the specific requirements of your application. Security compliance: In environments with strict security requirements, custom seccomp profiles can help ensure compliance by enforcing tighter restrictions on containerized processes. Limit container resources In Docker, containers are granted flexibility to consume CPU and RAM resources up to the extent allowed by the host kernel scheduler. While this flexibility facilitates efficient resource utilization, it also introduces potential risks: Security breaches: In the unfortunate event of a container compromise, attackers could exploit its unrestricted access to host resources for malicious activities. For instance, a compromised container could be exploited to mine cryptocurrency or execute other nefarious actions. Performance bottlenecks: Resource-intensive containers have the potential to monopolize system resources, leading to performance degradation or service outages across your applications. To mitigate these risks effectively, it’s crucial to establish clear resource limits for your containers: Allocate resources wisely: Assign specific amounts of CPU and RAM to each container to ensure fair distribution and prevent resource dominance. Enforce boundaries: Set hard limits that containers cannot exceed, effectively containing potential damage and thwarting resource exhaustion attacks. Promote harmony: Efficient resource management ensures stability, allowing containers to operate smoothly and fulfill their tasks without contention. For example, to limit CPU usage, you can run the container with: -docker run -it --cpus=".5" node-app This command limits the container to use only 50% of a single CPU core. Remember, setting resource limits isn’t just about efficiency — it’s a vital security measure that safeguards your host system and promotes harmony among your containerized applications. To prevent potential denial-of-service (DoS) attacks, limiting resources such as memory, CPU, file descriptors, and processes is crucial. Docker provides mechanisms to set these limits for individual containers. --restart=on-failure:<number_of_restarts> --ulimit nofile=<number> --ulimit nproc=<number> By diligently adhering to these least privilege principles, you can establish a robust security posture for your Docker containers. 6. Choose the right base image Finding the right image can seem daunting with more than 8.3 million repositories on Docker Hub. Two beacons can help guide you toward safe waters: Docker Official Images (DOI) and Docker Verified Publisher (DVP) badges. Docker Official Images (marked by a blue badge shield) offer a curated set of open source and drop-in solution repositories. These are your go-to for common bases like Ubuntu, Python, or Nginx. Imagine them as trusty ships, built with quality materials and regularly inspected for seaworthiness. Docker Verified Publisher Images (signified by a gold check mark) are like trusted partners, organizations who have teamed up with Docker to offer high-quality images. Docker verifies the authenticity and security of their content, giving you extra peace of mind. Think of them as sleek yachts, built by experienced shipwrights and certified by maritime authorities. Remember that Docker Official Images are a great starting point for common needs, and Verified Publisher images offer an extra layer of trust and security for crucial projects. Conclusion Optimizing Docker images for security involves a multifaceted approach, addressing image size, access controls, and vulnerability management. By understanding Docker image layering and leveraging practices such as choosing minimal base images and employing multi-stage builds, developers can significantly enhance efficiency and security. Running applications with least privileges, monitoring vulnerabilities with tools like Docker Scout, and implementing content trust further fortify the containerized ecosystem. As the Docker landscape evolves, staying informed about best practices and adopting proactive security measures is paramount. This guide serves as a valuable resource, empowering developers and operators to navigate the seas of Docker security with confidence and ensuring their applications are not only functional but also resilient to potential threats. Learn more Subscribe to the Docker Newsletter. Get the latest release of Docker Desktop. Get started with Docker Scout. Vote on what’s next! Check out our public roadmap. Have questions? The Docker community is here to help. New to Docker? Get started. View the full article
  22. This article is part of a project that’s split into two main phases. The first phase focuses on building a data pipeline. This involves getting data from an API and storing it in a PostgreSQL database. In the second phase, we’ll develop an application that uses a language model to interact with this database. Ideal for those new to data systems or language model applications, this project is structured into two segments: This initial article guides you through constructing a data pipeline utilizing Kafka for streaming, Airflow for orchestration, Spark for data transformation, and PostgreSQL for storage. To set-up and run these tools we will use Docker.The second article, which will come later, will delve into creating agents using tools like LangChain to communicate with external databases.This first part project is ideal for beginners in data engineering, as well as for data scientists and machine learning engineers looking to deepen their knowledge of the entire data handling process. Using these data engineering tools firsthand is beneficial. It helps in refining the creation and expansion of machine learning models, ensuring they perform effectively in practical settings. This article focuses more on practical application rather than theoretical aspects of the tools discussed. For detailed understanding of how these tools work internally, there are many excellent resources available online. OverviewLet’s break down the data pipeline process step-by-step: Data Streaming: Initially, data is streamed from the API into a Kafka topic.Data Processing: A Spark job then takes over, consuming the data from the Kafka topic and transferring it to a PostgreSQL database.Scheduling with Airflow: Both the streaming task and the Spark job are orchestrated using Airflow. While in a real-world scenario, the Kafka producer would constantly listen to the API, for demonstration purposes, we’ll schedule the Kafka streaming task to run daily. Once the streaming is complete, the Spark job processes the data, making it ready for use by the LLM application.All of these tools will be built and run using docker, and more specifically docker-compose. Overview of the data pipeline. Image by the author.Now that we have a blueprint of our pipeline, let’s dive into the technical details ! Local setupFirst you can clone the Github repo on your local machine using the following command: git clone https://github.com/HamzaG737/data-engineering-project.gitHere is the overall structure of the project: ├── LICENSE ├── README.md ├── airflow │ ├── Dockerfile │ ├── __init__.py │ └── dags │ ├── __init__.py │ └── dag_kafka_spark.py ├── data │ └── last_processed.json ├── docker-compose-airflow.yaml ├── docker-compose.yml ├── kafka ├── requirements.txt ├── spark │ └── Dockerfile └── src ├── __init__.py ├── constants.py ├── kafka_client │ ├── __init__.py │ └── kafka_stream_data.py └── spark_pgsql └── spark_streaming.pyThe airflow directory contains a custom Dockerfile for setting up airflow and a dags directory to create and schedule the tasks.The data directory contains the last_processed.json file which is crucial for the Kafka streaming task. Further details on its role will be provided in the Kafka section.The docker-compose-airflow.yaml file defines all the services required to run airflow.The docker-compose.yaml file specifies the Kafka services and includes a docker-proxy. This proxy is essential for executing Spark jobs through a docker-operator in Airflow, a concept that will be elaborated on later.The spark directory contains a custom Dockerfile for spark setup.src contains the python modules needed to run the application.To set up your local development environment, start by installing the required Python packages. The only essential package is psycopg2-binary. You have the option to install just this package or all the packages listed in the requirements.txt file. To install all packages, use the following command: pip install -r requirements.txtNext let’s dive step by step into the project details. About the APIThe API is RappelConso from the French public services. It gives access to data relating to recalls of products declared by professionals in France. The data is in French and it contains initially 31 columns (or fields). Some of the most important are: reference_fiche (reference sheet): Unique identifier of the recalled product. It will act as the primary key of our Postgres database later.categorie_de_produit (Product category): For instance food, electrical appliance, tools, transport means, etc …sous_categorie_de_produit (Product sub-category): For instance we can have meat, dairy products, cereals as sub-categories for the food category.motif_de_rappel (Reason for recall): Self explanatory and one of the most important fields.date_de_publication which translates to the publication date.risques_encourus_par_le_consommateur which contains the risks that the consumer may encounter when using the product.There are also several fields that correspond to different links, such as link to product image, link to the distributers list, etc..You can see some examples and query manually the dataset records using this link. We refined the data columns in a few key ways: Columns like ndeg_de_version and rappelguid, which were part of a versioning system, have been removed as they aren’t needed for our project.We combined columns that deal with consumer risks — risques_encourus_par_le_consommateur and description_complementaire_du_risque — for a clearer overview of product risks.The date_debut_fin_de_commercialisation column, which indicates the marketing period, has been divided into two separate columns. This split allows for easier queries about the start or end of a product’s marketing.We’ve removed accents from all columns except for links, reference numbers, and dates. This is important because some text processing tools struggle with accented characters.For a detailed look at these changes, check out our transformation script at src/kafka_client/transformations.py. The updated list of columns is available insrc/constants.py under DB_FIELDS. Kafka streamingTo avoid sending all the data from the API each time we run the streaming task, we define a local json file that contains the last publication date of the latest streaming. Then we will use this date as the starting date for our new streaming task. To give an example, suppose that the latest recalled product has a publication date of 22 november 2023. If we make the hypothesis that all of the recalled products infos before this date are already persisted in our Postgres database, We can now stream the data starting from the 22 november. Note that there is an overlap because we may have a scenario where we didn’t handle all of the data of the 22nd of November. The file is saved in ./data/last_processed.json and has this format: {last_processed:"2023-11-22"}By default the file is an empty json which means that our first streaming task will process all of the API records which are 10 000 approximately. Note that in a production setting this approach of storing the last processed date in a local file is not viable and other approaches involving an external database or an object storage service may be more suitable. The code for the kafka streaming can be found on ./src/kafka_client/kafka_stream_data.py and it involves primarily querying the data from the API, making the transformations, removing potential duplicates, updating the last publication date and serving the data using the kafka producer. The next step is to run the kafka service defined the docker-compose defined below: version: '3' services: kafka: image: 'bitnami/kafka:latest' ports: - '9094:9094' networks: - airflow-kafka environment: - KAFKA_CFG_NODE_ID=0 - KAFKA_CFG_PROCESS_ROLES=controller,broker - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093,EXTERNAL://:9094 - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,EXTERNAL://localhost:9094 - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,EXTERNAL:PLAINTEXT,PLAINTEXT:PLAINTEXT - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093 - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER volumes: - ./kafka:/bitnami/kafka kafka-ui: container_name: kafka-ui-1 image: provectuslabs/kafka-ui:latest ports: - 8800:8080 depends_on: - kafka environment: KAFKA_CLUSTERS_0_NAME: local KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: PLAINTEXT://kafka:9092 DYNAMIC_CONFIG_ENABLED: 'true' networks: - airflow-kafka networks: airflow-kafka: external: trueThe key highlights from this file are: The kafka service uses a base image bitnami/kafka.We configure the service with only one broker which is enough for our small project. A Kafka broker is responsible for receiving messages from producers (which are the sources of data), storing these messages, and delivering them to consumers (which are the sinks or end-users of the data). The broker listens to port 9092 for internal communication within the cluster and port 9094 for external communication, allowing clients outside the Docker network to connect to the Kafka broker.In the volumes part, we map the local directory kafka to the docker container directory /bitnami/kafka to ensure data persistence and a possible inspection of Kafka’s data from the host system.We set-up the service kafka-ui that uses the docker image provectuslabs/kafka-ui:latest . This provides a user interface to interact with the Kafka cluster. This is especially useful for monitoring and managing Kafka topics and messages.To ensure communication between kafka and airflow which will be run as an external service, we will use an external network airflow-kafka.Before running the kafka service, let’s create the airflow-kafka network using the following command: docker network create airflow-kafkaNow everything is set to finally start our kafka service docker-compose up After the services start, visit the kafka-ui at http://localhost:8800/. Normally you should get something like this: Overview of the Kafka UI. Image by the author.Next we will create our topic that will contain the API messages. Click on Topics on the left and then Add a topic at the top left. Our topic will be called rappel_conso and since we have only one broker we set the replication factor to 1. We will also set the partitions number to 1 since we will have only one consumer thread at a time so we won’t need any parallelism. Finally, we can set the time to retain data to a small number like one hour since we will run the spark job right after the kafka streaming task, so we won’t need to retain the data for a long time in the kafka topic. Postgres set-upBefore setting-up our spark and airflow configurations, let’s create the Postgres database that will persist our API data. I used the pgadmin 4 tool for this task, however any other Postgres development platform can do the job. To install postgres and pgadmin, visit this link https://www.postgresql.org/download/ and get the packages following your operating system. Then when installing postgres, you need to setup a password that we will need later to connect to the database from the spark environment. You can also leave the port at 5432. If your installation has succeeded, you can start pgadmin and you should observe something like this window: Overview of pgAdmin interface. Image by the author.Since we have a lot of columns for the table we want to create, we chose to create the table and add its columns with a script using psycopg2, a PostgreSQL database adapter for Python. You can run the script with the command: python scripts/create_table.pyNote that in the script I saved the postgres password as environment variable and name it POSTGRES_PASSWORD. So if you use another method to access the password you need to modify the script accordingly. Spark Set-upHaving set-up our Postgres database, let’s delve into the details of the spark job. The goal is to stream the data from the Kafka topic rappel_conso to the Postgres table rappel_conso_table. from pyspark.sql import SparkSession from pyspark.sql.types import ( StructType, StructField, StringType, ) from pyspark.sql.functions import from_json, col from src.constants import POSTGRES_URL, POSTGRES_PROPERTIES, DB_FIELDS import logging logging.basicConfig( level=logging.INFO, format="%(asctime)s:%(funcName)s:%(levelname)s:%(message)s" ) def create_spark_session() -> SparkSession: spark = ( SparkSession.builder.appName("PostgreSQL Connection with PySpark") .config( "spark.jars.packages", "org.postgresql:postgresql:42.5.4,org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0", ) .getOrCreate() ) logging.info("Spark session created successfully") return spark def create_initial_dataframe(spark_session): """ Reads the streaming data and creates the initial dataframe accordingly. """ try: # Gets the streaming data from topic random_names df = ( spark_session.readStream.format("kafka") .option("kafka.bootstrap.servers", "kafka:9092") .option("subscribe", "rappel_conso") .option("startingOffsets", "earliest") .load() ) logging.info("Initial dataframe created successfully") except Exception as e: logging.warning(f"Initial dataframe couldn't be created due to exception: {e}") raise return df def create_final_dataframe(df): """ Modifies the initial dataframe, and creates the final dataframe. """ schema = StructType( [StructField(field_name, StringType(), True) for field_name in DB_FIELDS] ) df_out = ( df.selectExpr("CAST(value AS STRING)") .select(from_json(col("value"), schema).alias("data")) .select("data.*") ) return df_out def start_streaming(df_parsed, spark): """ Starts the streaming to table spark_streaming.rappel_conso in postgres """ # Read existing data from PostgreSQL existing_data_df = spark.read.jdbc( POSTGRES_URL, "rappel_conso", properties=POSTGRES_PROPERTIES ) unique_column = "reference_fiche" logging.info("Start streaming ...") query = df_parsed.writeStream.foreachBatch( lambda batch_df, _: ( batch_df.join( existing_data_df, batch_df[unique_column] == existing_data_df[unique_column], "leftanti" ) .write.jdbc( POSTGRES_URL, "rappel_conso", "append", properties=POSTGRES_PROPERTIES ) ) ).trigger(once=True) \ .start() return query.awaitTermination() def write_to_postgres(): spark = create_spark_session() df = create_initial_dataframe(spark) df_final = create_final_dataframe(df) start_streaming(df_final, spark=spark) if __name__ == "__main__": write_to_postgres()Let’s break down the key highlights and functionalities of the spark job: First we create the Spark sessiondef create_spark_session() -> SparkSession: spark = ( SparkSession.builder.appName("PostgreSQL Connection with PySpark") .config( "spark.jars.packages", "org.postgresql:postgresql:42.5.4,org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0", ) .getOrCreate() ) logging.info("Spark session created successfully") return spark2. The create_initial_dataframe function ingests streaming data from the Kafka topic using Spark's structured streaming. def create_initial_dataframe(spark_session): """ Reads the streaming data and creates the initial dataframe accordingly. """ try: # Gets the streaming data from topic random_names df = ( spark_session.readStream.format("kafka") .option("kafka.bootstrap.servers", "kafka:9092") .option("subscribe", "rappel_conso") .option("startingOffsets", "earliest") .load() ) logging.info("Initial dataframe created successfully") except Exception as e: logging.warning(f"Initial dataframe couldn't be created due to exception: {e}") raise return df3. Once the data is ingested, create_final_dataframe transforms it. It applies a schema (defined by the columns DB_FIELDS) to the incoming JSON data, ensuring that the data is structured and ready for further processing. def create_final_dataframe(df): """ Modifies the initial dataframe, and creates the final dataframe. """ schema = StructType( [StructField(field_name, StringType(), True) for field_name in DB_FIELDS] ) df_out = ( df.selectExpr("CAST(value AS STRING)") .select(from_json(col("value"), schema).alias("data")) .select("data.*") ) return df_out4. The start_streaming function reads existing data from the database, compares it with the incoming stream, and appends new records. def start_streaming(df_parsed, spark): """ Starts the streaming to table spark_streaming.rappel_conso in postgres """ # Read existing data from PostgreSQL existing_data_df = spark.read.jdbc( POSTGRES_URL, "rappel_conso", properties=POSTGRES_PROPERTIES ) unique_column = "reference_fiche" logging.info("Start streaming ...") query = df_parsed.writeStream.foreachBatch( lambda batch_df, _: ( batch_df.join( existing_data_df, batch_df[unique_column] == existing_data_df[unique_column], "leftanti" ) .write.jdbc( POSTGRES_URL, "rappel_conso", "append", properties=POSTGRES_PROPERTIES ) ) ).trigger(once=True) \ .start() return query.awaitTermination()The complete code for the Spark job is in the file src/spark_pgsql/spark_streaming.py. We will use the Airflow DockerOperator to run this job, as explained in the upcoming section. Let’s go through the process of creating the Docker image we need to run our Spark job. Here’s the Dockerfile for reference: FROM bitnami/spark:latest WORKDIR /opt/bitnami/spark RUN pip install py4j COPY ./src/spark_pgsql/spark_streaming.py ./spark_streaming.py COPY ./src/constants.py ./src/constants.py ENV POSTGRES_DOCKER_USER=host.docker.internal ARG POSTGRES_PASSWORD ENV POSTGRES_PASSWORD=$POSTGRES_PASSWORDIn this Dockerfile, we start with the bitnami/spark image as our base. It's a ready-to-use Spark image. We then install py4j, a tool needed for Spark to work with Python. The environment variables POSTGRES_DOCKER_USER and POSTGRES_PASSWORD are set up for connecting to a PostgreSQL database. Since our database is on the host machine, we use host.docker.internal as the user. This allows our Docker container to access services on the host, in this case, the PostgreSQL database. The password for PostgreSQL is passed as a build argument, so it's not hard-coded into the image. It’s important to note that this approach, especially passing the database password at build time, might not be secure for production environments. It could potentially expose sensitive information. In such cases, more secure methods like Docker BuildKit should be considered. Now, let’s build the Docker image for Spark: docker build -f spark/Dockerfile -t rappel-conso/spark:latest --build-arg POSTGRES_PASSWORD=$POSTGRES_PASSWORD .This command will build the image rappel-conso/spark:latest . This image includes everything needed to run our Spark job and will be used by Airflow’s DockerOperator to execute the job. Remember to replace $POSTGRES_PASSWORD with your actual PostgreSQL password when running this command. AirflowAs said earlier, Apache Airflow serves as the orchestration tool in the data pipeline. It is responsible for scheduling and managing the workflow of the tasks, ensuring they are executed in a specified order and under defined conditions. In our system, Airflow is used to automate the data flow from streaming with Kafka to processing with Spark. Airflow DAGLet’s take a look at the Directed Acyclic Graph (DAG) that will outline the sequence and dependencies of tasks, enabling Airflow to manage their execution. start_date = datetime.today() - timedelta(days=1) default_args = { "owner": "airflow", "start_date": start_date, "retries": 1, # number of retries before failing the task "retry_delay": timedelta(seconds=5), } with DAG( dag_id="kafka_spark_dag", default_args=default_args, schedule_interval=timedelta(days=1), catchup=False, ) as dag: kafka_stream_task = PythonOperator( task_id="kafka_data_stream", python_callable=stream, dag=dag, ) spark_stream_task = DockerOperator( task_id="pyspark_consumer", image="rappel-conso/spark:latest", api_version="auto", auto_remove=True, command="./bin/spark-submit --master local[*] --packages org.postgresql:postgresql:42.5.4,org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.0 ./spark_streaming.py", docker_url='tcp://docker-proxy:2375', environment={'SPARK_LOCAL_HOSTNAME': 'localhost'}, network_mode="airflow-kafka", dag=dag, ) kafka_stream_task >> spark_stream_taskHere are the key elements from this configuration The tasks are set to execute daily.The first task is the Kafka Stream Task. It is implemented using the PythonOperator to run the Kafka streaming function. This task streams data from the RappelConso API into a Kafka topic, initiating the data processing workflow.The downstream task is the Spark Stream Task. It uses the DockerOperator for execution. It runs a Docker container with our custom Spark image, tasked with processing the data received from Kafka.The tasks are arranged sequentially, where the Kafka streaming task precedes the Spark processing task. This order is crucial to ensure that data is first streamed and loaded into Kafka before being processed by Spark.About the DockerOperatorUsing docker operator allow us to run docker-containers that correspond to our tasks. The main advantage of this approach is easier package management, better isolation and enhanced testability. We will demonstrate the use of this operator with the spark streaming task. Here are some key details about the docker operator for the spark streaming task: We will use the image rappel-conso/spark:latest specified in the Spark Set-up section.The command will run the Spark submit command inside the container, specifying the master as local, including necessary packages for PostgreSQL and Kafka integration, and pointing to the spark_streaming.py script that contains the logic for the Spark job.docker_url represents the url of the host running the docker daemon. The natural solution is to set it as unix://var/run/docker.sock and to mount the var/run/docker.sock in the airflow docker container. One problem we had with this approach is a permission error to use the socket file inside the airflow container. A common workaround, changing permissions with chmod 777 var/run/docker.sock, poses significant security risks. To circumvent this, we implemented a more secure solution using bobrik/socat as a docker-proxy. This proxy, defined in a Docker Compose service, listens on TCP port 2375 and forwards requests to the Docker socket: docker-proxy: image: bobrik/socat command: "TCP4-LISTEN:2375,fork,reuseaddr UNIX-CONNECT:/var/run/docker.sock" ports: - "2376:2375" volumes: - /var/run/docker.sock:/var/run/docker.sock networks: - airflow-kafkaIn the DockerOperator, we can access the host docker /var/run/docker.sock via thetcp://docker-proxy:2375 url, as described here and here. Finally we set the network mode to airflow-kafka. This allows us to use the same network as the proxy and the docker running kafka. This is crucial since the spark job will consume the data from the kafka topic so we must ensure that both containers are able to communicate.After defining the logic of our DAG, let’s understand now the airflow services configuration in the docker-compose-airflow.yaml file. Airflow ConfigurationThe compose file for airflow was adapted from the official apache airflow docker-compose file. You can have a look at the original file by visiting this link. As pointed out by this article, this proposed version of airflow is highly resource-intensive mainly because the core-executor is set to CeleryExecutor that is more adapted for distributed and large-scale data processing tasks. Since we have a small workload, using a single-noded LocalExecutor is enough. Here is an overview of the changes we made on the docker-compose configuration of airflow: We set the environment variable AIRFLOW__CORE__EXECUTOR to LocalExecutor.We removed the services airflow-worker and flower because they only work for the Celery executor. We also removed the redis caching service since it works as a backend for celery. We also won’t use the airflow-triggerer so we remove it too.We replaced the base image ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.7.3} for the remaining services, mainly the scheduler and the webserver, by a custom image that we will build when running the docker-compose.version: '3.8' x-airflow-common: &airflow-common build: context: . dockerfile: ./airflow_resources/Dockerfile image: de-project/airflow:latestWe mounted the necessary volumes that are needed by airflow. AIRFLOW_PROJ_DIR designates the airflow project directory that we will define later. We also set the network as airflow-kafka to be able to communicate with the kafka boostrap servers.volumes: - ${AIRFLOW_PROJ_DIR:-.}/dags:/opt/airflow/dags - ${AIRFLOW_PROJ_DIR:-.}/logs:/opt/airflow/logs - ${AIRFLOW_PROJ_DIR:-.}/config:/opt/airflow/config - ./src:/opt/airflow/dags/src - ./data/last_processed.json:/opt/airflow/data/last_processed.json user: "${AIRFLOW_UID:-50000}:0" networks: - airflow-kafkaNext, we need to create some environment variables that will be used by docker-compose: echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_PROJ_DIR=\"./airflow_resources\"" > .envWhere AIRFLOW_UID represents the User ID in Airflow containers and AIRFLOW_PROJ_DIR represents the airflow project directory. Now everything is set-up to run your airflow service. You can start it with this command: docker compose -f docker-compose-airflow.yaml upThen to access the airflow user interface you can visit this url http://localhost:8080 . Sign-in window on Airflow. Image by the author.By default, the username and password are airflow for both. After signing in, you will see a list of Dags that come with airflow. Look for the dag of our project kafka_spark_dag and click on it. Overview of the task window in airflow. Image by the author.You can start the task by clicking on the button next to DAG: kafka_spark_dag. Next, you can check the status of your tasks in the Graph tab. A task is done when it turns green. So, when everything is finished, it should look something like this: Image by the author.To verify that the rappel_conso_table is filled with data, use the following SQL query in the pgAdmin Query Tool: SELECT count(*) FROM rappel_conso_tableWhen I ran this in January 2024, the query returned a total of 10022 rows. Your results should be around this number as well. ConclusionThis article has successfully demonstrated the steps to build a basic yet functional data engineering pipeline using Kafka, Airflow, Spark, PostgreSQL, and Docker. Aimed primarily at beginners and those new to the field of data engineering, it provides a hands-on approach to understanding and implementing key concepts in data streaming, processing, and storage. Throughout this guide, we’ve covered each component of the pipeline in detail, from setting up Kafka for data streaming to using Airflow for task orchestration, and from processing data with Spark to storing it in PostgreSQL. The use of Docker throughout the project simplifies the setup and ensures consistency across different environments. It’s important to note that while this setup is ideal for learning and small-scale projects, scaling it for production use would require additional considerations, especially in terms of security and performance optimization. Future enhancements could include integrating more advanced data processing techniques, exploring real-time analytics, or even expanding the pipeline to incorporate more complex data sources. In essence, this project serves as a practical starting point for those looking to get their hands dirty with data engineering. It lays the groundwork for understanding the basics, providing a solid foundation for further exploration in the field. In the second part, we’ll explore how to effectively use the data stored in our PostgreSQL database. We’ll introduce agents powered by Large Language Models (LLMs) and a variety of tools that enable us to interact with the database using natural language queries. So, stay tuned ! To reach outLinkedIn : https://www.linkedin.com/in/hamza-gharbi-043045151/Twitter : https://twitter.com/HamzaGh25079790End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story. View the full article
  23. We’re pleased to announce Docker Desktop 4.27, packed with exciting new features and updates. The new release includes key advancements such as synchronized file shares, collaboration enhancements in Docker Build Cloud, the introduction of the private marketplace for extensions (available for Docker Business customers), and the much-anticipated release of Moby 25. Additionally, we explore the support for Testcontainers with Enhanced Container Isolation, the general availability of docker init with expanded language support, and the beta release of Docker Debug. These updates represent significant strides in improving development workflows, enhancing security, and offering advanced customization for Docker users. Docker Desktop synchronized file shares GA We’re diving into some fantastic updates for Docker Desktop, and we’re especially thrilled to introduce our latest feature, synchronized file shares, which is available now in version 4.27 (Figure 1). Following our acquisition announcement in June 2023, we have integrated the technology behind Mutagen into the core of Docker Desktop. You can now say goodbye to the challenges of using large codebases in containers with virtual filesystems. Synchronized file shares unlock native filesystem performance for bind mounts and provides a remarkable 2-10x boost in file operation speeds. For developers managing extensive codebases, this is a game-changer. Figure 1: Shares have been created and are available for use in containers. To get started, log in to Docker Desktop with your subscription account (Pro, Teams, or Business) to harness the power of Docker Desktop synchronized file shares. You can read more about this feature in the Docker documentation. Collaborate on shared Docker Build Cloud builds in Docker Desktop With the recent GA of Docker Build Cloud, your team can now leverage Docker Desktop to use powerful cloud-based build machines and shared caching to reduce unnecessary rebuilds and get your build done in a fraction of the time, regardless of your local physical hardware. New builds can make instant use of the shared cache. Even if this is your first time building the project, you can immediately speed up build times with shared caches. We know that team members have varying levels of Docker expertise. When a new developer has issues with their build failing, the Builds view makes it effortless for anyone on the team to locate the troublesome build using search and filtering. They can then collaborate on a fix and get unblocked in no time. When all your team is building on the same cloud builder, it can get noisy, so we added filtering by specific build types, helping you focus on the builds that are important to you. Link to builder settings for a build Previously, to access builder settings, you had to jump back to the build list or the settings page, but now you can access them directly from a build (Figure 2). Figure 2: Access builder settings directly from a build. Delete build history for a builder And, until now you could only delete build in batches, which meant if you wanted to clear the build history it required a lot of clicks. This update enables you to clear all builds easily (Figure 3). Figure 3: Painlessly clear the build history for an individual builder. Refresh storage data for your builder at any point in time Refreshing the storage data is an intensive operation, so it only happens periodically. Previously, when you were clearing data, you would have to wait a while to see the update. Now it’s just a one-click process (Figure 4). Figure 4: Quickly refresh storage data for a builder to get an up-to-date view of your usage. New feature: Private marketplace for extensions available for Docker Business subscribers Docker Business customers now have exclusive access to a new feature: the private marketplace for extensions. This enhancement focuses on security, compliance, and customization, and empowering developers, providing: Controlled access: Manage which extensions developers can use through allow-listing. Private distribution: Easily distribute company-specific extensions from a private registry. Customized development: Deploy customized team processes and tools as unpublished/private Docker extensions tailored to a specific organization. The private marketplace for extensions enables a secure, efficient, and tailored development environment, aligning with your enterprise’s specific needs. Get started today by learning how to configure a private marketplace for extensions. Moby 25 release — containerd image store We are happy to announce the release of Moby 25.0 with Docker Desktop 4.27. In case you’re unfamiliar, Moby is the open source project for Docker Engine, which ships in Docker Desktop. We have dedicated significant effort to this release, which marks a major release milestone for the open source Moby project. You can read a comprehensive list of enhancements in the v25.0.0 release notes. With the release of Docker Desktop 4.27, support for the containerd image store has graduated from beta to general availability. This work began in September 2022 when we started extending the Docker Engine integration with containerd, so we are excited to have this functionality reach general availability. This support provides a more robust user experience by natively storing and building multi-platform images and using snapshotters for lazy pulling images (e.g., stargz) and peer-to-peer image distribution (e.g., dragonfly, nydus). It also provides a foundation for you to run Wasm containers (currently in beta). Using the containerd image store is not currently enabled by default for all users but can be enabled in the general settings in Docker Desktop under Use containers for pulling and storing images (Figure 5). Figure 5: Enable use of the containerd image store in the general settings in Docker Desktop. Going forward, we will continue improving the user experience of pushing, pulling, and storing images with the containerd image store, help migrate user images to use containerd, and work toward enabling it by default for all users. As always, you can try any of the features landing in Moby 25 in Docker Desktop. Support for Testcontainers with Enhanced Container Isolation Docker Desktop 4.27 introduces the ability to use the popular Testcontainers framework with Enhanced Container Isolation (ECI). ECI, which is available to Docker Business customers, provides an additional layer of security to prevent malicious workloads running in containers from compromising the Docker Desktop or the host by running containers without root access to the Docker Desktop VM, by vetting sensitive system calls inside containers and other advanced techniques. It’s meant to better secure local development environments. Before Docker Desktop 4.27, ECI blocked mounting the Docker Engine socket into containers to increase security and prevent malicious containers from gaining access to Docker Engine. However, this also prevented legitimate scenarios (such as Testcontainers) from working with ECI. Starting with Docker Desktop 4.27, admins can now configure ECI to allow Docker socket mounts, but in a controlled way (e.g., on trusted images of their choice) and even restrict the commands that may be sent on that socket. This functionality, in turn, enables users to enjoy the combined benefits of frameworks such as Testcontainers (or any others that require containers to access the Docker engine socket) with the extra security and peace of mind provided by ECI. Docker init GA with Java support Initially released in its beta form in Docker 4.18, docker init has undergone several enhancements. The docker init command-line utility aids in the initialization of Docker resources within a project. It automatically generates Dockerfiles, Compose files, and .dockerignore files based on the nature of the project, significantly reducing the setup time and complexity associated with Docker configurations. The initial beta release of docker init only supported Go and generic projects. The latest version, available in Docker 4.27, supports Go, Python, Node.js, Rust, ASP.NET, PHP, and Java (Figure 6). Figure 6. Docker init will suggest the best template for the application. The general availability of docker init offers an efficient and user-friendly way to integrate Docker into your projects. Whether you’re a seasoned Docker user or new to containerization, docker init is ready to enhance your development workflow. Beta release of Docker Debug As previously announced at DockerCon 2023, Docker Debug is now available as a beta offering in Docker Desktop 4.27. Figure 7: Docker Debug. Developers can spend as much as 60% of their time debugging their applications, with much of that time taken up by sorting and configuring tools and setup instead of debugging. Docker Debug (available in Pro, Teams, or Business subscriptions) provides a language-independent, integrated toolbox for debugging local and remote containerized apps — even when the container fails to launch — enabling developers to find and solve problems faster. To get started, run docker debug <Container or Image name> in the Docker Desktop CLI while logged in with your subscription account. Conclusion Docker Desktop’s latest updates and features, from synchronized file shares to the first beta release of Docker Debug, reflect our ongoing commitment to enhancing developer productivity and operational efficiency. Integrating these capabilities into Docker Desktop streamlines development processes and empowers teams to collaborate more effectively and securely. As Docker continues to evolve, we remain dedicated to providing our community and customers with innovative solutions that address the dynamic needs of modern software development. Stay tuned for further updates and enhancements, and as always, we encourage you to explore these new features to see how they can benefit your development workflow. Upgrade to Docker Desktop 4.27 to explore these updates and experiment with Docker’s latest features. Learn more Read the Docker Desktop Release Notes. Install and authenticate against the latest release of Docker Desktop. Learn more about synchronized file shares. Check out Docker Build Cloud. Read Streamline Dockerization with Docker Init GA Read Docker Init: Initialize Dockerfiles and Compose files with a single CLI command. Have questions? The Docker community is here to help. New to Docker? Get started. View the full article
  24. This post was contributed by Thierry Moreau, co-founder and head of DevRel at OctoAI. Generative AI models have shown immense potential over the past year with breakthrough models like GPT3.5, DALL-E, and more. In particular, open source foundational models have gained traction among developers and enterprise users who appreciate how customizable, cost-effective, and transparent these models are compared to closed-source alternatives. In this article, we’ll explore how you can compose an open source foundational model into a streamlined image transformation pipeline that lets you manipulate images with nothing but text to achieve surprisingly good results. With this approach, you can create fun versions of corporate logos, bring your kids’ drawings to life, enrich your product photography, or even remodel your living room (Figure 1). Figure 1: Examples of image transformation including, from left to right: Generating creative corporate logo, bringing children’s drawings to life, enriching commercial photography, remodeling your living room Pretty cool, right? Behind the scenes, a lot needs to happen, and we’ll walk step by step through how to reproduce these results yourself. We call the multimodal GenAI pipeline OctoShop as a nod to the popular image editing software. Feeling inspired to string together some foundational GenAI models? Let’s dive into the technology that makes this possible. Architecture overview Let’s look more closely at the open source foundational GenAI models that compose the multimodal pipeline we’re about to build. Going forward, we’ll use the term “model cocktail” instead of “multimodal GenAI model pipeline,” as it flows a bit better (and sounds tastier, too). A model cocktail is a mix of GenAI models that can process and generate data across multiple modalities: text and images are examples of data modalities across which GenAI models consume and produce data, but the concept can also extend to audio and video (Figure 2). To build on the analogy of crafting a cocktail (or mocktail, if you prefer), you’ll need to mix ingredients, which, when assembled, are greater than the sum of their individual parts. Figure 2: The multimodal GenAI workflow — by taking an image and text, this pipeline transforms the input image according to the text prompt. Let’s use a Negroni, for example — my favorite cocktail. It’s easy to prepare; you need equal parts of gin, vermouth, and Campari. Similarly, our OctoShop model cocktail will use three ingredients: an equal mix of image-generation (SDXL), text-generation (Mistral-7B), and a custom image-to-text generation (CLIP Interrogator) model. The process is as follows: CLIP Interrogator takes in an image and generates a textual description (e.g., “a whale with a container on its back”). An LLM model, Mistral-7B, will generate a richer textual description based on a user prompt (e.g., “set the image into space”). The LLM will consequently transform the description into a richer one that meets the user prompt (e.g., “in the vast expanse of space, a majestic whale carries a container on its back”). Finally, an SDXL model will be used to generate a final AI-generated image based on the textual description transformed by the LLM model. We also take advantage of SDXL styles and a ControlNet to better control the output of the image in terms of style and framing/perspective. Prerequisites Let’s go over the prerequisites for crafting our cocktail. Here’s what you’ll need: Sign up for an OctoAI account to use OctoAI’s image generation (SDXL), text generation (Mistral-7B), and compute solutions (CLIP Interrogator) — OctoAI serves as the bar from which to get all of the ingredients you’ll need to craft your model cocktail. If you’re already using a different compute service, feel free to bring that instead. Run a Jupyter notebook to craft the right mix of GenAI models. This is your place for experimenting and mixing, so this will be your cocktail shaker. To make it easy to run and distribute the notebook, we’ll use Google Colab. Finally, we’ll deploy our model cocktail as a Streamlit app. Think of building your app and embellishing the frontend as the presentation of your cocktail (e.g., glass, ice, and choice of garnish) to enhance your senses. Getting started with OctoAI Head to octoai.cloud and create an account if you haven’t done so already. You’ll receive $10 in credits upon signing up for the first time, which should be sufficient for you to experiment with your own workflow here. Follow the instructions on the Getting Started page to obtain an OctoAI API token — this will help you get authenticated whenever you use the OctoAI APIs. Notebook walkthrough We’ve built a Jupyter notebook in Colab to help you learn how to use the different models that will constitute your model cocktail. Here are the steps to follow: 1. Launch the notebook Get started by launching the following Colab notebook. There’s no need to change the runtime type or rely on a GPU or TPU accelerator — all we need is a CPU here, given that all of the AI heavy-lifting is done on OctoAI endpoints. 2. OctoAI SDK setup Let’s get started by installing the OctoAI SDK. You’ll use the SDK to invoke the different open source foundational models we’re using, like SDXL and Mistral-7B. You can install through pip: # Install the OctoAI SDK !pip install octoai-sdk In some cases, you may get a message about pip packages being previously imported in the runtime, causing an error. If that’s the case, selecting the Restart Session button at the bottom should take care of the package versioning issues. After this, you should be able to re-run the cell that pip-installs the OctoAI SDK without any issues. 3. Generate images with SDXL You’ll first learn to generate an image with SDXL using the Image Generation solution API. To learn more about what each parameter does in the code below, check out OctoAI’s ImageGenerator client. In particular, the ImageGenerator API takes several arguments to generate an image: Engine: Lets you choose between versions of Stable Diffusion models, such as SDXL, SD1.5, and SSD. Prompt: Describes the image you want to generate. Negative prompt: Describes the traits you want to avoid in the final image. Width, height: The resolution of the output image. Num images: The number of images to generate at once. Sampler: Determines the sampling method used to denoise your image. If you’re not familiar with this process, this article provides a comprehensive overview. Number of steps: Number of denoising steps — the more steps, the higher the quality, but generally going past 30 will lead to diminishing returns. Cfg scale: How closely to adhere to the image description — generally stays around 7-12. Use refiner: Whether to apply the SDXL refiner model, which improves the output quality of the image. Seed: A parameter that lets you control the reproducibility of image generation (set to a positive value to always get the same image given stable input parameters). Note that tweaking the image generation parameters — like number of steps, number of images, sampler used, etc. — affects the amount of GPU compute needed to generate an image. Increasing GPU cycles will affect the pricing of generating the image. Here’s an example using simple parameters: # To use OctoAI, we'll need to set up OctoAI to use it from octoai.clients.image_gen import Engine, ImageGenerator # Now let's use the OctoAI Image Generation API to generate # an image of a whale with a container on its back to recreate # the moby logo image_gen = ImageGenerator(token=OCTOAI_API_TOKEN) image_gen_response = image_gen.generate( engine=Engine.SDXL, prompt="a whale with a container on its back", negative_prompt="blurry photo, distortion, low-res, poor quality", width=1024, height=1024, num_images=1, sampler="DPM_PLUS_PLUS_2M_KARRAS", steps=20, cfg_scale=7.5, use_refiner=True, seed=1 ) images = image_gen_response.images # Display generated image from OctoAI for i, image in enumerate(images): pil_image = image.to_pil() display(pil_image) Feel free to experiment with the parameters to see what happens to the resulting image. In this case, I’ve put in a simple prompt meant to describe the Docker logo: “a whale with a container on its back.” I also added standard negative prompts to help generate the style of image I’m looking for. Figure 3 shows the output: Figure 3: An SDXL-generated image of a whale with a container on its back. 4. Control your image output with ControlNet One thing you may want to do with SDXL is control the composition of your AI-generated image. For example, you can specify a specific human pose or control the composition and perspective of a given photograph, etc. For our experiment using Moby (the Docker mascot), we’d like to get an AI-generated image that can be easily superimposed onto the original logo — same shape of whale and container, orientation of the subject, size, and so forth. This is where ControlNet can come in handy: they let you constrain the generation of images by feeding a control image as input. In our example we’ll feed the image of the Moby logo as our control input. By tweaking the following parameters used by the ImageGenerator API, we are constraining the SDXL image generation with a control image of Moby. That control image will be converted into a depth map using a depth estimation model, then fed into the ControlNet, which will constrain SDXL image generation. # Set the engine to controlnet SDXL engine="controlnet-sdxl", # Select depth controlnet which uses a depth map to apply # constraints to SDXL controlnet="depth_sdxl", # Set the conditioning scale anywhere between 0 and 1, try different # values to see what they do! controlnet_conditioning_scale=0.3, # Pass in the base64 encoded string of the moby logo image controlnet_image=image_to_base64(moby_image), Now the result looks like it matches the Moby outline a lot more closely (Figure 4). This is the power of ControlNet. You can adjust the strength by varying the controlnet_conditioning_scale parameter. This way, you can make the output image more or less faithfully match the control image of Moby. Figure 4: Left: The Moby logo is used as a control image to a ControlNet. Right: the SDXL-generated image resembles the control image more closely than in the previous example. 5. Control your image output with SDXL style presets Let’s add a layer of customization with SDXL styles. We’ll use the 3D Model style preset (Figure 5). Behind the scenes, these style presets are adding additional keywords to the positive and negative prompts that the SDXL model ingests. Figure 5: You can try various styles on the OctoAI Image Generation solution UI — there are more than 100 to choose from, each delivering a unique feel and aesthetic. Figure 6 shows how setting this one parameter in the ImageGenerator API transforms our AI-generated image of Moby. Go ahead and try out more styles; we’ve generated a gallery for you to get inspiration from. Figure 6: SDXL-generated image of Moby with the “3D Model” style preset applied. 6. Manipulate images with Mistral-7B LLM So far we’ve relied on SDXL, which does text-to-image generation. We’ve added ControlNet in the mix to apply a control image as a compositional constraint. Next, we’re going to layer an LLM into the mix to transform our original image prompt into a creative and rich textual description based on a “transformation prompt.” Basically, we’re going to use an LLM to make our prompt better automatically. This will allow us to perform image manipulation using text in our OctoShop model cocktail pipeline: Take a logo of Moby: Set it into an ultra-realistic photo in space. Take a child’s drawing: Bring it to life in a fantasy world. Take a photo of a cocktail: Set it on a beach in Italy. Take a photo of a living room: Transform it into a staged living room in a designer house. To achieve this text-to-text transformation, we will use the LLM user prompt as follows. This sets the original textual description of Moby into a new setting: the vast expanse of space. ''' Human: set the image description into space: “a whale with a container on its back” AI: ''' We’ve configured the LLM system prompt so that LLM responses are concise and at most one sentence long. We could make them longer, but be aware that the prompt consumed by SDXL has a 77-token context limit. You can read more on the text generation Python SDK and its Chat Completions API used to generate text: Model: Lets you choose out of selection of foundational open source models like Mixtral, Mistral, Llama2, Code Llama (the selection will grow with more open source models being released). Messages: Contains a list of messages (system and user) to use as context for the completion. Max tokens: Enforces a hard limit on output tokens (this could cut a completion response in the middle of a sentence). Temperature: Lets you control the creativity of your answer: with a higher temperature, less likely tokens can be selected. The choice of model, input, and output tokens will influence pricing on OctoAI. In this example, we’re using the Mistral-7B LLM, which is a great open source LLM model that really packs a punch given its small parameter size. Let’s look at the code used to invoke our Mistral-7B LLM: # Let's go ahead and start with the original prompt that we used in our # image generation examples. image_desc = "a whale with a container on its back" # Let's then prepare our LLM prompt to manipulate our image llm_prompt = ''' Human: set the image description into space: {} AI: '''.format(image_desc) # Now let's use an LLM to transform this craft clay rendition # of Moby into a fun scify universe from octoai.client import Client client = Client(OCTOAI_API_TOKEN) completion = client.chat.completions.create( messages=[ { "role": "system", "content": "You are a helpful assistant. Keep your responses short and limited to one sentence." }, { "role": "user", "content": llm_prompt } ], model="mistral-7b-instruct-fp16", max_tokens=128, temperature=0.01 ) # Print the message we get back from the LLM llm_image_desc = completion.choices[0].message.content print(llm_image_desc) Here’s the output: Our LLM has created a short yet imaginative description of Moby traveling through space. Figure 7 shows the result when we feed this LLM-generated textual description into SDXL. Figure 7: SDXL-generated image of Moby where we used an LLM to set the scene in space and enrich the text prompt. This image is great. We can feel the immensity of space. With the power of LLMs and the flexibility of SDXL, we can take image creation and manipulation to new heights. And the great thing is, all we need to manipulate those images is text; the GenAI models do the rest of the work. 7. Automate the workflow with AI-based image labeling So far in our image transformation pipeline, we’ve had to manually label the input image to our OctoShop model cocktail. Instead of just passing in the image of Moby, we had to provide a textual description of that image. Thankfully, we can rely on a GenAI model to perform text labeling tasks: CLIP Interrogator. Think of this task as the reverse of what SDXL does: It takes in an image and produces text as the output. To get started, we’ll need a CLIP Interrogator model running behind an endpoint somewhere. There are two ways to get a CLIP Interrogator model endpoint on OctoAI. If you’re just getting started, we recommend the simple approach, and if you feel inspired to customize your model endpoint, you can use the more advanced approach. For instance, you may be interested in trying out the more recent version of CLIP Interrogator. You can now invoke the CLIP Interrogator model in a few lines of code. We’ll use the fast interrogator mode here to get a label generated as quickly as possible. # Let's go ahead and invoke the CLIP interrogator model # Note that under a cold start scenario, you may need to wait a minute or two # to get the result of this inference... Be patient! output = client.infer( endpoint_url=CLIP_ENDPOINT_URL+'/predict', inputs={ "image": image_to_base64(moby_image), "mode": "fast" } ) # All labels clip_labels = output["completion"]["labels"] print(clip_labels) # Let's get just the top label top_label = clip_labels.split(',')[0] print(top_label) The top label described our Moby logo as: That’s pretty on point. Now that we’ve tested all ingredients individually, let’s assemble our model cocktail and test it on interesting use cases. 8. Assembling the model cocktail Now that we have tested our three models (CLIP interrogator, Mistral-7B, SDXL), we can package them into one convenient function, which takes the following inputs: An input image that will be used to control the output image and also be automatically labeled by our CLIP interrogator model. A transformation string that describes the transformation we want to apply to the input image (e.g., “set the image description in space”). A style string which lets us better control the artistic output of the image independently of the transformation we apply to it (e.g., painterly style vs. cinematic). The function below is a rehash of all of the code we’ve introduced above, packed into one function. def genai_transform(image: Image, transformation: str, style: str) -> Image: # Step 1: CLIP captioning output = client.infer( endpoint_url=CLIP_ENDPOINT_URL+'/predict', inputs={ "image": image_to_base64(image), "mode": "fast" } ) clip_labels = output["completion"]["labels"] top_label = clip_labels.split(',')[0] # Step 2: LLM transformation llm_prompt = ''' Human: {}: {} AI: '''.format(transformation, top_label) completion = client.chat.completions.create( messages=[ { "role": "system", "content": "You are a helpful assistant. Keep your responses short and limited to one sentence." }, { "role": "user", "content": llm_prompt } ], model="mistral-7b-instruct-fp16", max_tokens=128, presence_penalty=0, temperature=0.1, top_p=0.9, ) llm_image_desc = completion.choices[0].message.content # Step 3: SDXL+controlnet transformation image_gen_response = image_gen.generate( engine="controlnet-sdxl", controlnet="depth_sdxl", controlnet_conditioning_scale=0.4, controlnet_image=image_to_base64(image), prompt=llm_image_desc, negative_prompt="blurry photo, distortion, low-res, poor quality", width=1024, height=1024, num_images=1, sampler="DPM_PLUS_PLUS_2M_KARRAS", steps=20, cfg_scale=7.5, use_refiner=True, seed=1, style_preset=style ) images = image_gen_response.images # Display generated image from OctoAI pil_image = images[0].to_pil() return top_label, llm_image_desc, pil_image Now you can try this out on several images, prompts, and styles. Package your model cocktail into a web app Now that you’ve mixed your unique GenAI cocktail, it’s time to pour it into a glass and garnish it, figuratively speaking. We built a simple Streamlit frontend that lets you deploy your unique OctoShop GenAI model cocktail and share the results with your friends and colleagues (Figure 8). You can check it on GitHub. Follow the README instructions to deploy your app locally or get it hosted on Streamlit’s web hosting services. Figure 8: The Streamlit app transforms images into realistic renderings in space — all thanks to the magic of GenAI. We look forward to seeing what great image-processing apps you come up with. Go ahead and share your creations on OctoAI’s Discord server in the #built_with_octo channel! If you want to learn how you can put OctoShop behind a Discord Bot or build your own model containers with Docker, we also have instructions on how to do that from an AI/ML workshop organized by OctoAI at DockerCon 2023. About OctoAI OctoAI provides infrastructure to run GenAI at scale, efficiently, and robustly. The model endpoints that OctoAI delivers to serve models like Mixtral, Stable Diffusion XL, etc. all rely on Docker to containerize models and make them easier to serve at scale. If you go to octoai.cloud, you’ll find three complementary solutions that developers can build on to bring their GenAI-powered apps and pipelines into production. Image Generation solution exposes endpoints and APIs to perform text to image, image to image tasks built around open source foundational models such as Stable Diffusion XL or SSD. Text Generation solution exposes endpoints and APIs to perform text generation tasks built around open source foundational models, such as Mixtral/Mistral, Llama2, or CodeLlama. Compute solution lets you deploy and manage any dockerized model container on capable OctoAI cloud endpoints to power your demanding GenAI needs. This compute service complements the image generation and text generation solutions by exposing infinite programmability and customizability for AI tasks that are not currently readily available on either the image generation or text generation solutions. Disclaimer OctoShop is built on the foundation of CLIP Interrogator and SDXL, and Mistral-7B and is therefore likely to carry forward the potential dangers inherent in these base models. It’s capable of generating unintended, unsuitable, offensive, and/or incorrect outputs. We therefore strongly recommend exercising caution and conducting comprehensive assessments before deploying this model into any practical applications. This GenAI model workflow doesn’t work on people as it won’t preserve their likeness; the pipeline works best on scenes, objects, or animals. Solutions are available to address this problem, such as face mapping techniques (also known as face swapping), which we can containerize with Docker and deploy on OctoAI Compute solution, but that’s something to cover in another blog post. Conclusion This article covered the fundamentals of building a GenAI model cocktail by relying on a combination of text generation, image generation, and compute solutions powered by the portability and scalability enabled by Docker containerization. If you’re interested in learning more about building these kinds of GenAI model cocktails, check out the OctoAI demo page or join OctoAI on Discord to see what people have been building. Acknowledgements The authors acknowledge Justin Gage for his thorough review, as well as Luis Vega, Sameer Farooqui, and Pedro Toruella for their contributions to the DockerCon AI/ML Workshop 2023, which inspired this article. The authors also thank Cia Bodin and her daughter Ada for the drawing used in this blog post. Learn more Watch the DockerCon 2023 Docker for ML, AI, and Data Science workshop. Get the latest release of Docker Desktop. Vote on what’s next! Check out our public roadmap. Have questions? The Docker community is here to help. New to Docker? Get started. View the full article
  • Forum Statistics

    42.9k
    Total Topics
    42.2k
    Total Posts
×
×
  • Create New...