Jump to content

Search the Community

Showing results for tags 'best practices'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • General
    • General Discussion
    • Artificial Intelligence
    • DevOps Forum News
  • DevOps & SRE
    • DevOps & SRE General Discussion
    • Databases, Data Engineering & Data Science
    • Development & Programming
    • CI/CD, GitOps, Orchestration & Scheduling
    • Docker, Containers, Microservices, Serverless & Virtualization
    • Infrastructure-as-Code
    • Kubernetes
    • Linux
    • Logging, Monitoring & Observability
    • Red Hat OpenShift
    • Security
  • Cloud Providers
    • Amazon Web Services
    • Google Cloud Platform
    • Microsoft Azure

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Cloud Platforms


Cloud Experience


Development Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

  1. Learn why you need to track in-app events, what to track and what not to track, and get a few pro tips on designing your event data.View the full article
  2. Learn how to successfully plan and instrument event tracking for your websites and applications to improve data quality at the source.View the full article
  3. Maintaining a strong security posture is crucial in today’s digital landscape, and it begins with users. Trusting users with access to sensitive data and company assets is a web of complexity, and one bad apple or security gap can knock all the dominos down. In fact, Verizon’s 2023 Data Breach Investigations Report noted that 74% […] The post 9 Best Practices for Using AWS Access Analyzer appeared first on Security Boulevard. View the full article
  4. Reading Time: 5 min Data privacy in email communication refers to the protection and confidentiality of personal data. Learn about data privacy regulations, particularly GDPR. The post Data Privacy in Email Communication: Compliance, Risks, and Best Practices appeared first on Security Boulevard. View the full article
  5. How to bridge the dev / data divide through alignment, collaboration, early enforcement, and transparency.View the full article
  6. MySQL provides several replication configuration options. However, ensuring it is done correctly may take time and effort, with considerable choices. Replication is a crucial initial step to enhance availability in MySQL databases. A properly designed replication architecture can significantly impact the accessibility of your data and prevent potential management complications. This article will delve into […] View the full article
  7. AI has become an integral part of my workflow these days, and with the assistance of GitHub Copilot, I move a lot faster when I’m building a project. Having used AI tools to increase my productivity over the past year, I’ve realized that similar to learning how to use a new framework or library, we can enhance our efficiency with AI tools by learning how to best use them. In this blog post, I’ll share some of the daily things I do to get the most out of GitHub Copilot. I hope these tips will help you become a more efficient and productive user of the AI assistant. Beyond code completion To make full use of the power of GitHub Copilot, it’s important to understand its capabilities. GitHub Copilot is developing rapidly, and new features are being added all the time. It’s no longer just a code completion tool in your editor—it now includes a chat interface that you can use in your IDE, a command line tool via a GitHub CLI extension, a summary tool in your pull requests, a helper tool in your terminals, and much, much more. In a recent blog post, I’ve listed some of the ways you didn’t know you could use GitHub Copilot. This will give you a great overview of how much the AI assistant can currently do. But beyond interacting with GitHub Copilot, how do you help it give you better answers? Well, the answer to that needs a bit more context. Context, context, context If you understand Large Language Models ( LLMs), you will know that they are designed to make predictions based on the context provided. This means, the more contextually rich our input or prompt is, the better the prediction or output will be. As such, learning to provide as much context as possible is key when interacting with GitHub Copilot, especially with the code completion feature. Unlike ChatGPT where you need to provide all the data to the model in the prompt window, by installing GitHub Copilot in your editor, the assistant is able to infer context from the code you’re working on. It then uses that context to provide code suggestions. We already know this, but what else can we do to give it additional context? I want to share a few essential tips with you to provide GitHub Copilot with more context in your editor to get the most relevant and useful code out of it: 1. Open your relevant files Having your files open provides GitHub Copilot with context. When you have additional files open, it will help to inform the suggestion that is returned. Remember, if a file is closed, GitHub Copilot cannot see the file’s content in your editor, which means it cannot get the context from those closed files. GitHub Copilot looks at the current open files in your editor to analyze the context, create a prompt that gets sent to the server, and return an appropriate suggestion. Have a few files open in your editor to give GitHub Copilot a bigger picture of your project. You can also use #editor in the chat interface to provide GitHub Copilot with additional context on your currently opened files in Visual Studio Code (VS Code) and Visual Studio. https://github.blog/wp-content/uploads/2024/03/01_editor_command_open_files.mp4 Remember to close unneeded files when context switching or moving on to the next task. 2. Provide a top-level comment Just as you would give a brief, high-level introduction to a coworker, a top-level comment in the file you’re working in can help GitHub Copilot understand the overall context of the pieces you will be creating—especially if you want your AI assistant to generate the boilerplate code for you to get going. Be sure to include details about what you need and provide a good description so it has as much information as possible. This will help to guide GitHub Copilot to give better suggestions, and give it a goal on what to work on. Having examples, especially when processing data or manipulation strings, helps quite a bit. 3. Set Includes and references It’s best to manually set the includes/imports or module references you need for your work, particularly if you’re working with a specific version of a package. GitHub Copilot will make suggestions, but you know what dependencies you want to use. This can also help to let GitHub Copilot know what frameworks, libraries, and their versions you’d like it to use when crafting suggestions. This can be helpful to jump start GitHub Copilot to a newer library version when it defaults to providing older code suggestions. https://github.blog/wp-content/uploads/2024/03/03_includes_references.mp4 4. Meaningful names matter The name of your variables and functions matter. If you have a function named foo or bar, GitHub Copilot will not be able to give you the best completion because it isn’t able to infer intent from the names. Just as the function name fetchData() won’t mean much to a coworker (or you after a few months), fetchData() won’t mean much to GitHub Copilot either. Implementing good coding practices will help you get the most value from GitHub Copilot. While GitHub Copilot helps you code and iterate faster, remember the old rule of programming still applies: garbage in, garbage out. 5. Provide specific and well- scoped function comments Commenting your code helps you get very specific, targeted suggestions. A function name can only be so descriptive without being overly long, so function comments can help fill in details that GitHub Copilot might need to know. One of the neat features about GitHub Copilot is that it can determine the correct comment syntax that is typically used in your programming language for function / method comments and will help create them for you based on what the code does. Adding more detail to these as the first change you do then helps GitHub Copilot determine what you would like to do in code and how to interact with that function. Remember: Single, specific, short comments help GitHub Copilot provide better context. https://github.blog/wp-content/uploads/2024/03/05_simple_specific_short.mp4 6. Provide sample code Providing sample code to GitHub Copilot will help it determine what you’re looking for. This helps to ground the model and provide it with even more context. It also helps GitHub Copilot generate suggestions that match the language and tasks you want to achieve, and return suggestions based on your current coding standards and practices. Unit tests provide one level of sample code at the individual function/method level, but you can also provide code examples in your project showing how to do things end to end. The cool thing about using GitHub Copilot long-term is that it nudges us to do a lot of the good coding practices we should’ve been doing all along. Learn more about providing context to GitHub Copilot by watching this Youtube video: Inline Chat with GitHub Copilot Inline chat Outside of providing enough context, there are some built-in features of GitHub Copilot that you may not be taking advantage of. Inline chat, for example, gives you an opportunity to almost chat with GitHub Copilot between your lines of code. By pressing CMD + I (CTRL + I on Windows) you’ll have Copilot right there to ask questions. This is a bit more convenient for quick fixes instead of opening up GitHub Copilot Chat’s side panel. https://github.blog/wp-content/uploads/2024/03/07_a_inline_chat_animated.mp4 This experience provides you with code diffs inline, which is awesome. There are also special slash commands available like creating documentation with just the slash of a button! Tips and tricks with GitHub Copilot Chat GitHub Copilot Chat provides an experience in your editor where you can have a conversation with the AI assistant. You can improve this experience by using built-in features to make the most out of it. 8. Remove irrelevant requests For example, did you know that you can delete a previously asked question in the chat interface to remove it from the indexed conversation? Especially if it is no longer relevant? Doing this will improve the flow of conversation and give GitHub Copilot only the necessary information needed to provide you with the best output. 9. Navigate through your conversation Another tip I found is to use the up and down arrows to navigate through your conversation with GitHub Copilot Chat. I found myself scrolling through the chat interface to find that last question I asked, then discovered I can just use my keyboard arrows just like in the terminal! https://github.blog/wp-content/uploads/2024/03/09_up_down_arrows_animated.mp4 10. Use the @workspace agent If you’re using VS Code or Visual Studio, remember that agents are available to help you go even further. The @workspace agent for example, is aware of your entire workspace and can answer questions related to it. As such, it can provide even more context when trying to get a good output from GitHub Copilot. https://github.blog/wp-content/uploads/2024/03/10_workspace_agent.mp4 11. Highlight relevant code Another great tip when using GitHub Copilot Chat is to highlight relevant code in your files before asking it questions. This will help to give targeted suggestions and just provides the assistant with more context into what you need help with. 12. Organize your conversations with threads You can have multiple ongoing conversations with GitHub Copilot Chat on different topics by isolating your conversations with threads. We’ve provided a convenient way for you to start new conversations (thread) by clicking the + sign on the chat interface. 13. Slash Commands for common tasks Slash commands are awesome, and there are quite a few of them. We have commands to help you explain code, fix code, create a new notebook, write tests, and many more. They are just shortcuts to common prompts that we’ve found to be particularly helpful in day-to-day development from our own internal usage. Command Description Usage /explain Get code explanations Open file with code or highlight code you want explained and type: /explain what is the fetchPrediction method? /fix Receive a proposed fix for the problems in the selected code Highlight problematic code and type: /fix propose a fix for the problems in fetchAirports route /tests Generate unit tests for selected code Open file with code or highlight code you want tests for and type: /tests /help Get help on using Copilot Chat Type: /help what can you do? /clear Clear current conversation Type: /clear /doc Add a documentation comment Highlight code and type: /doc You can also press CMD+I in your editor and type /doc/ inline /generate Generate code to answer your question Type: /generate code that validates a phone number /optimize Analyze and improve running time of the selected code Highlight code and type: /optimize fetchPrediction method /clear Clear current chat Type: /clear /new Scaffold code for a new workspace Type: /new create a new django app /simplify Simplify the selected code Highlight code and type: /simplify /feedback Provide feedback to the team Type: /feedback See the following image for commands available in VS Code: 14. Attach relevant files for reference In Visual Studio and VS Code, you can attach relevant files for GitHub Copilot Chat to reference by using #file. This scopes GitHub Copilot to a particular context in your code base and provides you with a much better outcome. To reference a file, type # in the comment box, choose #file and you will see a popup where you can choose your file. You can also type #file_name.py in the comment box. See below for an example: https://github.blog/wp-content/uploads/2024/03/14_attach_filename.mp4 15. Start with GitHub Copilot Chat for faster debugging These days whenever I need to debug some code, I turn to GitHub Copilot Chat first. Most recently, I was implementing a decision tree and performed a k-fold cross-validation. I kept getting the incorrect accuracy scores and couldn’t figure out why. I turned to GitHub Copilot Chat for some assistance and it turns out I wasn’t using my training data set (X_train, y_train), even though I thought I was: I'm catching up on my AI/ML studies today. I had to implement a DecisionTree and use the cross_val_score method to evaluate the model's accuracy score. I couldn't figure out why the incorrect values for the accuracy scores were being returned, so I turned to Chat for some help pic.twitter.com/xn2ctMjAnr — Kedasha is learning about AI + ML (@itsthatladydev) March 23, 2024 I figured this out a lot faster than I would’ve with external resources. I want to encourage you to start with GitHub Copilot Chat in your editor to get debugging help faster instead of going to external resources first. Follow my example above by explaining the problem, pasting the problematic code, and asking for help. You can also highlight the problematic code in your editor and use the /fix command in the chat interface. Be on the lookout for sparkles! In VS Code, you can quickly get help from GitHub Copilot by looking out for “magic sparkles.” For example, in the commit comment section, clicking the magic sparkles will help you generate a commit message with the help of AI. You can also find magic sparkles inline in your editor as you’re working for a quick way to access GitHub Copilot inline chat. https://github.blog/wp-content/uploads/2024/03/15_magic_sparkles.mp4 Pressing them will use AI to help you fill out the data and more magic sparkles are being added where we find other places for GitHub Copilot to help in your day-to-day coding experience. Know where your AI assistant shines To get the best and most out of the tool, remember that context and prompt crafting is essential to keep in mind. Understanding where the tool shines best is also important. Some of the things GitHub Copilot is very good at include boilerplate code and scaffolding, writing unit tests, writing documentation, pattern matching, explaining uncommon or confusing syntax, cron jobs, and regex, and helping you remember things you’ve forgotten and debugging. But never forget that you are in control, and GitHub Copilot is here as just that, your copilot. It is a tool that can help you write code faster, and it’s up to you to decide how to best use it. It is not here to do your work for you or to write everything for you. It will guide you and nudge you in the right direction just as a coworker would if you asked them questions or for guidance on a particular issue. I hope these tips and best practices were helpful. You can significantly improve your coding efficiency and output by properly leveraging GitHub Copilot. Learn more about how GitHub Copilot works by reading Inside GitHub: Working with the LLMs behind GitHub Copilot and Customizing and fine-tuning LLMs: What you need to know. Harness the power of GitHub Copilot. Learn more or get started now.
  8. In today's fast-paced world of technology, efficient application deployment and management are crucial. Kubernetes, a game-changing platform for container orchestration, is at the forefront of this evolution. At Atmosly, we leverage Kubernetes to empower organizations in navigating the rapidly evolving digital landscape, offering solutions that intertwine with Kubernetes' strengths to enhance your technological capabilities. What Is Kubernetes? Kubernetes, or K8s, is a revolutionary container orchestration system born at Google. It has become a cornerstone of contemporary IT infrastructure, providing robust solutions for deploying, scaling, and managing containerized applications. At Atmosly, we integrate Kubernetes into our offerings, ensuring our clients benefit from its scalable, reliable, and efficient nature. View the full article
  9. Learn about the benefits, best practices and see an example of a payroll register in this comprehensive guide.View the full article
  10. In the dynamic realm of software development and deployment, Docker has emerged as a cornerstone technology, revolutionizing the way developers package, distribute, and manage applications. Docker simplifies the process of handling applications by containerizing them, ensuring consistency across various computing environments. A critical aspect of Docker that often puzzles many is Docker networking. It’s an essential feature, enabling containers to communicate with each other and the outside world. This ultimate guide aims to demystify Docker networking, offering you tips, tricks, and best practices to leverage Docker networking effectively. Understanding Docker Networking Basics Docker networking allows containers to communicate with each other and with other networks. Docker provides several network drivers, each serving different use cases: View the full article
  11. There are several steps involved in implementing a data pipeline that integrates Apache Kafka with AWS RDS and uses AWS Lambda and API Gateway to feed data into a web application. Here is a high-level overview of how to architect this solution: 1. Set Up Apache Kafka Apache Kafka is a distributed streaming platform that is capable of handling trillions of events a day. To set up Kafka, you can either install it on an EC2 instance or use Amazon Managed Streaming for Kafka (Amazon MSK), which is a fully managed service that makes it easy to build and run applications that use Apache Kafka to process streaming data. View the full article
  12. A swift esponse to a major outage can make a big difference in regards to retaining customer confidence.View the full article
  13. Because of the critical nature of the DevOps pipeline, security is becoming a top priority. Here's how to integrate DevSecOps.View the full article
  14. Docker images play a pivotal role in containerized application deployment. They encapsulate your application and its dependencies, ensuring consistent and efficient deployment across various environments. However, security is a paramount concern when working with Docker images. In this guide, we will explore security best practices for Docker images to help you create and maintain secure images for your containerized applications. 1. Introduction The Significance of Docker Images Docker images are at the core of containerization, offering a standardized approach to packaging applications and their dependencies. They allow developers to work in controlled environments and empower DevOps teams to deploy applications consistently across various platforms. However, the advantages of Docker images come with security challenges, making it essential to adopt best practices to protect your containerized applications. View the full article
  15. Private Service Connect is a Cloud Networking offering that creates a private and secure connection from your VPC networks to a service producer, and is designed to help you consume services faster, protect your data, and simplify service management. However, like all complex networking setups, sometimes things don’t work as planned. In this post, you will find useful tips that can help you to tackle issues related to Private Service Connect, even before reaching out to Cloud Support. Introduction to Private Service Connect Before we get into the troubleshooting bits, let’s briefly discuss the basics of Private Service Connect. Understanding your setup is key for isolating the problem. Private Service Connect is similar to private services access, except that the service producer VPC network doesn't connect to your (consumer) network using VPC network peering. A Private Service Connect service producer can be Google, a third-party, or even yourself. When we talk about consumers and producers, it's important to understand what type of Private Service Connect is configured on the consumer side and what kind of managed service it intends to connect with on the producer side. Consumers are the ones who want the services, while producers are the ones who provide them. The various types of Private Service Connect configurations are: Private Service Connect endpoints are configured as forwarding rules which are allocated with an IP address and it is mapped to a managed service by targeting a Google API bundle or a service attachment. These managed services can be diverse, ranging from global Google APIs to Google Managed Services, third-party services, and even in-house, intra-organization services. When a consumer creates an endpoint that references a Google APIs bundle, the endpoint's IP address is a global internal IP address – the consumer picks an internal IP address that's outside all subnets of the consumer's VPC network and connected networks. When a consumer creates an endpoint that references a service attachment, the endpoint's IP address is a regional internal IP address in the consumer's VPC network – from a subnet in the same region as the service attachment. Private Service Connect backends are configured with a special Network Endpoint Group of the type Private Service Connect which refers to a locational Google API, or to a published service service attachment. A service attachment is your link to a compatible producer load balancer. And Private Service Connect interfaces, a special type of network interface that allows service producers to initiate connections to service consumers. How Private Service Connect works Network Address Translation (NAT) is the underlying network technology that powers up Private Service Connect using Google Cloud’s software-defined networking stack called Andromeda. Let's break down how Private Service Connect works to access a published service based on an internal network-passthrough load balancer using a connect endpoint. In this scenario, you set up a Private Service Connect endpoint on the consumer side by configuring a forwarding rule that targets a service attachment. This endpoint has an IP address within your VPC network. When a VM instance in the VPC network sends traffic to this endpoint, the host’s networking stack will apply client-side load balancing to send the traffic to a destination host based on the location, load and health.The packets are encapsulated and routed through Google Cloud’s network fabric.At the destination host, the packet processor will apply Source Network Address Translation (SNAT) and Destination Network Address Translation (DNAT) using the NAT subnet configured and the producer IP address of the service, respectively.The packet is delivered to the VM instance serving as the load balancer’s backend.All of this is orchestrated by Andromeda’s control plane; with a few exceptions, there are no middle box or intermediaries involved in this process, enabling you to achieve line rate performance. For additional details, see Private Service Connect architecture and performance. With this background, you should be already able to identify the main components where issues could occur: the source host, the network fabric, the destination host, and the control-plane. Know your troubleshooting toolsThe Google Cloud console provides you with the following tools to troubleshoot most of the Private Service Connect issues that you might encounter. Connectivity TestConnectivity Tests is a diagnostics tool that lets you check connectivity between network endpoints. It analyzes your configuration and, in some cases, performs live data-plane analysis between the endpoints. Configuration Analysis supports Private Service Connect: Consumers can check connectivity from their source systems to PSC endpoints (or consumer load balancers using PSC NEG backends), while producers can verify that their service is operational for consumers. Live Data Plane Analysis supports both Private Service Connect endpoints for published services and Google APIs: Verify reachability and latency between hosts by sending probe packets over the data plane. This feature provides baseline diagnostics of latency and packet loss. In cases where Live Data Plane Analysis is not available, consumers can coordinate with a service producer to collect simultaneous packet captures at the source and destination using tcpdump. Cloud Logging Cloud Logging is a fully managed service that allows you to store, search, analyze, monitor, and alert on logging data and events. Audit logs allow you to monitor Private Service Connect activity. Use them to track intentional or unintentional changes to Private Service Connect resources, find any errors or warnings and monitor changes in connection status for the endpoint. These are mostly useful when troubleshooting issues during the setup or updates in the configuration. In this example, you can track endpoint connection status changes (pscConnectionStatus) by examining audit logs for your GCE forwarding rule resource: code_block <ListValue: [StructValue([('code', 'resource.type="gce_forwarding_rule"\r\nprotoPayload.methodName="LogPscConnectionStatusUpdate"'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef828a856a0>)])]> VPC Flow Logs to monitor Private Service Connect traffic. Consumers can enable VPC Flow Logs at the client subnet to monitor traffic flow directed to the Private Service Connect endpoint. This allows the consumer to validate traffic egressing the VM instance. Producers can enable VPC Flow Logs at the target load balancer subnet to monitor traffic ingressing their VM instances backends. Consider that VPC Flow Logs are sampled and may not capture short-lived connections. To get more detailed information, run a packet capture using tcpdump. Cloud Monitoring Another member of the observability stack, Cloud Monitoring can help you to gain visibility into the performance of Private Service Connect. Another member of the observability stack, Cloud Monitoring can help you to gain visibility into the performance of Private Service Connect. Producer metrics to monitor Published services. Take a look at the utilization of service attachment resources like NAT ports, connected forwarding rules and connections by service attachment ID to correlate with connectivity and performance issues. See if there are any dropped packets at the producer side (Preview feature). Received packets dropped count are related to NAT resource exhaustion. Sent packets dropped count indicate that a service backend is sending packets to a consumer after the NAT translation state has expired. When this occurs, make sure you are following the NAT subnets recommendations. A packet capture could bring more insights on the nature of the dropped packets. Using this MQL query, producers can monitor NAT subnet capacity for a specific service attachment: code_block <ListValue: [StructValue([('code', 'fetch gce_service_attachment\r\n| metric\r\n \'compute.googleapis.com/private_service_connect/producer/used_nat_ip_addresses\'\r\n| filter (resource.region == "us-central1"\r\n && resource.service_attachment_id == "[SERVICE_ATTACHMENT_ID]")\r\n| group_by 1m,\r\n [value_used_nat_ip_addresses_mean: mean(value.used_nat_ip_addresses)]\r\n| every 1m\r\n| group_by [resource.service_attachment_id],\r\n [value_used_nat_ip_addresses_mean_mean:\r\n mean(value_used_nat_ip_addresses_mean)]'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef828a856d0>)])]> Consumer metrics to monitor endpoints. You can track the number of connections created, opened and closed from clients to the Private Service Connect endpoint. If you see packet drops, take a look at the producer metrics as well. For more information, see Monitor Private Service Connect connections. TIP: Be proactive and set alerts to inform you when you are close to exhausting a known limit (including Private Service Connect quotas). In this example, you can use this MQL query to track PSC Internal LB Forwarding Rules quota usage. code_block <ListValue: [StructValue([('code', "fetch compute.googleapis.com/VpcNetwork\r\n| metric\r\n 'compute.googleapis.com/quota/psc_ilb_consumer_forwarding_rules_per_producer_vpc_network/usage'\r\n| group_by 1m, [value_usage_mean: mean(value.usage)]\r\n| every 1m\r\n| group_by [], [value_usage_mean_aggregate: aggregate(value_usage_mean)]"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef828a85130>)])]> Read the manualConsult the Google Cloud documentation to learn about the limitations and supported configurations. Follow the Private Service Connect guides. Especially for new deployments, it is common to misconfigure a component or find that it is not compatible or supported yet. Ensure that you have gone through the right configuration steps, and go through the limitations and compatibility matrix.Take a look at the VPC Release notes. See if there are any known issues related to Private Service Connect, and look for any new features that could have introduced unwanted behavior. Common issuesSelecting the right tool depends on the specific situation you encounter and where you are in the life cycle of your Private Service Connect journey. Before you start, gather consumer and producer project details, and that in fact, this is a Private Service Connect issue, and not a Private services access problem. Generally, you can face issues during setup or update of any related component or additional capability, or the issues could be present during runtime, when everything is configured but you run into connectivity or performance issues. Issues during setupMake sure that you are following the configuration guide and you have an understanding of the scope and limitations. Check for any error message or warning in the Logs Explorer.Verify that the setup is compatible and supported as per the configuration guides.See if there is any related quota exceeded like the Private Service Connect forwarding rules.Confirm whether there is an organization policy that could prevent the configuration of Private Service Connect components.Issues during runtimeIsolate the issue to the consumer or the producer side of the connection. If you are on the consumer side, check if your endpoint or backend is accepted in the connection status at the Private Service Connect page. Otherwise, review in the producer side the accept/reject connection list and the connection reconciliation setup.If your endpoint is unreachable, check bypassing DNS resolution and run a Connectivity Test to validate routes and firewalls from the source endpoint IP address to the PSC endpoint as destination. On the service producer side, check if the producer service is reachable within the producer VPC network, and from an IP address in the Private Service Connect NAT subnet.If there is a performance issue like network latency or packet drops, check if Live Data Plane Analysis is available to determine a baseline and isolate an issue with the application or service. Also, check the Metrics Explorer for any connections or port exhaustion and packet drops.Working with Cloud SupportOnce that you have pinpointed the issue and you have analyzed the problem, you may need to reach out to Cloud Support for further assistance. To facilitate a smooth experience, be sure to explain your needs, clearly describe the business impact and give enough context with all the information collected. View the full article
  16. Developing and running secure Docker applications demands a strategic approach, encompassing considerations like avoiding unnecessary bloat in images and access methods. One crucial aspect to master in Docker development is understanding image layering and optimization. Docker images are constructed using layers, each representing specific changes or instructions in the image’s build process. In this article, we’ll delve into the significance of Docker image layering, the importance of choosing minimal base images, and practical approaches like multi-stage builds. Additionally, we’ll discuss the critical practices of running applications as non-root users, checking images for vulnerabilities using tools like Docker Scout, and implementing Docker Content Trust for image integrity. This comprehensive guide aims to equip developers and operators with actionable insights to enhance the security and efficiency of Docker applications. Understanding Docker image layering Before we jump into Docker security aspects, we need to understand Docker image layering and optimization. For a better understanding, let’s consider this Dockerfile, retrieved from a sample repository. It’s a simple React program that prints “Hello World.” The core code uses React, a JavaScript library for building user interfaces. Docker images comprise layers, and each layer represents a set of file changes or instructions in the image’s construction. These layers are stacked on each other to form the complete image (Figure 1). To combine them, a “unioned filesystem” is created, which basically takes all of the layers of the image and overlays them together. These layers are immutable. When you’re building an image, you’re simply creating new filesystem diffs, not modifying previous layers. Figure 1: Visual representation of layers in a Docker image. When you build a Docker image, each instruction in your Dockerfile creates a new layer. Layers are cached, so if you make a change in your code and rebuild the image, only the layers affected by that change will be recreated, saving time and bandwidth. This layering system makes images efficient to use. You might notice that there are two COPY instructions (as shown in Figure 1). The first COPY instruction copies only package.json (and potentially package-lock.json) into the image. The second COPY instruction copies the remaining application code (excluding files already copied in the first COPY command). If only application code changes, the first two layers are cached, avoiding re-downloading and reinstalling dependencies, which can significantly speed up builds. 1. Choose a minimal base image Docker Hub has millions of images, and choosing the right image for your application is important. It is always better to consider a minimal base image with a small size, as slimmer images contain fewer dependencies, resulting in less surface area to attract. Not only does a smaller image improve your image security, but it also reduces the time for pulling and pushing images and optimizing the overall development lifecycle. Figure 2: Example of Docker images with different sizes. As depicted in Figure 2, we opted for the node:21.6-alpine3.18 image due to its smaller footprint. We selected the Alpine image for our Node application below because it omits additional tools and packages present in the default Node image. This decision aligns with good security practices, as it minimizes the attack surface by eliminating unnecessary components for running your application. # Use the official Node.js image with Alpine Linux as the base image FROM node:21.6-alpine3.18 # Set the working directory inside the container to /app WORKDIR /app # Copy package.json and package-lock.json to the working directory COPY package*.json ./ # Install Node.js dependencies based on the package.json RUN npm install # Copy all files from the current directory to the working directory in the container COPY . . # Expose port 3000 EXPOSE 3000 # Define the command to run your application when the container starts CMD ["npm", "start"] 2. Use multi-stage builds Multi-stage builds offer a great way to streamline Docker images, making them smaller and more secure. They allow us to trim down a hefty 1.9 GB image to a lean 140 MB by using different build stages. In this approach, we leverage multiple FROM statements and carefully pick only the necessary pieces from one stage to another. We have converted our Dockerfile to a multi-stage one (Figure 3). In the first stage, we use a Node.js image to build the app, manage dependencies, and create application files (see the Dockerfile below). In the second stage, we copy the lightweight files generated in the first step and use Nginx to run them. We skip the build tool required to build the app in the final stage. This is why the final image is small and suitable for the production environment. Also, this is a great representation that we don’t need the heavyweight system on which we build; we can copy them to a lighter runner to run the app. Figure 3: High-level representation of Docker multi-stage build. # Stage 1: Build the application FROM node:21.6-alpine3.18 AS builder # Set the working directory for the build stage WORKDIR /app # Copy package.json and package-lock.json COPY package*.json ./ # Install dependencies RUN npm install # Copy the application source code into the container COPY . . # Build the application RUN npm run build # Stage 2: Create the final image FROM nginx:1.20 # Set the working directory within the container WORKDIR /app # Copy the built application files from the builder stage to the nginx html directory COPY --from=builder /app/build /usr/share/nginx/html # Expose port 80 for the web server EXPOSE 80 # Start nginx in the foreground CMD ["nginx", "-g", "daemon off;"] You can access this Dockerfile directly from a repository on GitHub. 3. Check your images for vulnerabilities using Docker Scout Let’s look at the following multi-stage Dockerfile: # Stage 1: Build the application FROM node:21.6-alpine3.18 AS builder # Set the working directory for the build stage WORKDIR /app # Copy package.json and package-lock.json COPY package*.json ./ # Install dependencies RUN npm install # Copy the application source code into the container COPY . . # Build the application RUN npm run build # Stage 2: Create the final image FROM nginx:1.20 # Set the working directory within the container WORKDIR /app # Copy the built application files from the builder stage to the nginx html directory COPY --from=builder /app/build /usr/share/nginx/html # Expose port 80 for the web server EXPOSE 80 # Start nginx in the foreground CMD ["nginx", "-g", "daemon off;"] You can run the following command to build a Docker image: docker build -t react-app-multi-stage . -f Dockerfile.multi Once the build process is complete, the CLI lets you view a summary of image vulnerabilities and recommendations. That’s what Docker Scout is all about. => exporting to image 0.0s => => exporting layers 0.0s => => writing image sha256:f348bcb19411fa1c4abf2e682f3dded7963c0c0c9b39c31804df5cd0e0f185d9 0.0s => => naming to docker.io/library/react-node-app 0.0s View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/sci2bo7xihgwnfihigd8x9uh1 What's Next? View a summary of image vulnerabilities and recommendations → docker scout quickview Docker Scout analyzes the contents of container images and generates a report of packages and vulnerabilities that it detects, helping users to identify and remediate issues. Docker Scout image analysis is more than point-in-time scanning; the analysis gets reevaluated continuously, meaning you don’t need to re-scan the image to see an updated vulnerability report. If your base image has a security concern, Docker Scout will check for updates and patches to suggest how to replace the image. If issues exist in other layers, Docker Scout will reveal precisely where it was introduced and make recommendations accordingly (Figure 4). Figure 4: How Docker Scout works. Docker Scout uses Software Bills of Materials (SBOMs) to cross-reference with streaming Common Vulnerabilities and Exposures (CVE) data to surface vulnerabilities (and potential remediation recommendations) as soon as possible. An SBOM is a nested inventory, a list of ingredients that make up software components. Docker Scout is built on a streaming event-driven data model, providing actionable CVE reports. Once the SBOM is generated and exists, Docker Scout automatically checks between existing SBOMs and new CVEs. You will see automatic updates for new CVEs without re-scanning artifacts. After building the image, we will open Docker Desktop (ensure you have the latest version installed), analyze the level of vulnerabilities, and fix them. We can also use Docker Scout from the Docker CLI, but Docker Desktop gives you a better way to visualize the stuff. Select Docker Scout from the sidebar and choose the image. Here, we have chosen the react-app-multi-stage, which we built just now. As you can see, Scout immediately shows vulnerabilities and their level. We can select View packages and CVEs beside that to take a deep look and get recommendations (Figure 5). Figure 5: Docker Scout tab in Docker Desktop. Now, a window will open, which shows you a detailed report about the vulnerabilities and layer-wise breakdown (Figure 6). Figure 6: Detailed report of vulnerabilities. To get recommendations to fix the image vulnerabilities, select Recommended Fixes in the top-right corner, and a dialog box will open with the recommended fixes. As shown in Figure 7, it recommends upgrading Nginx from version 1.20 to 1.24, which has fewer vulnerabilities and fixes all the critical and higher-level issues. Also, a good thing to note is that even though version 1.25 was available, it still recommends version 1.24 because 1.25 has critical vulnerabilities compared to 1.24. Figure 7: Recommendation tab for fixing vulnerabilities in Docker Desktop. Now, we need to rebuild our image by changing the base image of the final stage to the recommended version 1.24 (Figure 8), which will fix those vulnerabilities. Figure 8: Advanced image analysis with Docker Scout. The key features and capabilities of Docker Scout include: Unified view: Docker Scout provides a single view of your application’s dependencies from all layers, allowing you to easily understand your image composition and identify remediation steps. Event-driven vulnerability updates: Docker Scout uses an event-driven data model to continuously detect and surface vulnerabilities, ensuring that analysis is always up-to-date and based on the latest CVEs. In-context remediation recommendations: Docker Scout provides integrated recommendations visible in Docker Desktop, suggesting remediation options for base image updates and dependency updates within your application code layers. Note that Docker Scout is available through multiple interfaces, including the Docker Desktop and Docker Hub user interfaces, as well as a web-based user interface and a command-line interface (CLI) plugin. Users can view and interact with Docker Scout through these interfaces to gain a deeper understanding of the composition and security of their container images. 4. Use Docker Content Trust Docker Content Trust (DCT) lets you sign and verify Docker images, ensuring they come from trusted sources and haven’t been tampered with. This process acts like a digital seal of approval for images, whether signed by people or automated processes. To enable Docker Content Trust, follow these steps: Initialize Docker Content Trust Before you can sign images, ensure that Docker Content Trust is initialized. Open a terminal and run the following command: export DOCKER_CONTENT_TRUST=1 Sign the Docker image Sign the Docker image using the following command: docker build -t <your_namespace>/node-app docker trust sign <your_namespace>/node-app ... v1.0: digest: sha256:5fa48a9b4e52a9d9681a5786b4885be080668d06019e91eece6dfded5a0f8a47 size: 1986 Signing and pushing trust metadata Enter passphrase for <namespace> key with ID 96c9857: Successfully signed docker.io/<your_namespace/node-app:v1.0 Push the signed image to a registry You can push the signed Docker image to a registry with: docker push <your_namespace/node-app:v1.0 Verify the signature To verify the signature of an image, use the following command: docker trust inspect --pretty <your_namespace>/node-app:v1.0 Signatures for your_namespace/node-app:v1.0 SIGNED TAG DIGEST SIGNERS v1.0 5fa48a9b4e52a9d968XXXXXX19e91eece6dfded5a0f8a47 <your_namespace> List of signers and their keys for <your_namespace>/node-app:v1.0 SIGNER KEYS ajeetraina 96c985786950 Administrative keys for <your_namespace>/node-app:v1.0 Repository Key: 47214511f851e28018a7b0443XXXXXXc7d5846bf6f7 Root Key: 52bae142a9ac98a473c5275bXXXXXX2f4f5068081d567903dd By following these steps, you’ve enabled Docker Content Trust for your Node.js application, signing and verifying the image to enhance security and ensure the integrity of your containerized application throughout its lifecycle. 5. Practice least privileges Security is crucial in containerized environments. Embracing the principle of least privilege ensures that Docker containers operate with only the necessary permissions, thereby reducing the attack surface and mitigating potential security risks. Let’s explore specific best practices for achieving least privilege in Docker. Run as non-root user We minimize potential risks by running applications without unnecessary high-level access (root privileges). Many applications don’t need root privileges. So, in the Dockerfile, we can create a non-root system user to run the application inside the container with the limited privileges of the non-root user, improving security and holding to the principle of least privilege. # Stage 1: Build the application FROM node:21.6-alpine3.18 AS builder # Set the working directory for the build stage WORKDIR /app # Copy package.json and package-lock.json COPY package*.json ./ # Install dependencies RUN npm install # Copy the application source code into the container COPY . . # Build the application RUN npm run build # Stage 2: Create the final image FROM nginx:1.20 # Set the working directory within the container WORKDIR /app # Set ownership and permissions for nginx user RUN chown -R nginx:nginx /app && \ chmod -R 755 /app && \ chown -R nginx:nginx /var/cache/nginx && \ chown -R nginx:nginx /var/log/nginx && \ chown -R nginx:nginx /etc/nginx/conf.d # Create nginx user and set appropriate permissions RUN touch /var/run/nginx.pid && \ chown -R nginx:nginx /var/run/nginx.pid # Switch to the nginx user USER nginx # Copy the built application files from the builder stage to the nginx html directory COPY --from=builder /app/build /usr/share/nginx/html # Expose port 80 for the web server EXPOSE 80 # CMD to start nginx in the foreground CMD ["nginx", "-g", "daemon off;"] If we are using Node as the final base image (Figure 9), we can add USER node to our Dockerfile to run the application as a non-root user. The node user is created within the Node image with restricted permissions, unlike the root user, which has full control over the system. By default, the Docker Node image includes a non-root node user that you can use to avoid running your application container as root. Figure 9: Images tab in Docker Desktop. Limit capabilities Limiting Linux kernel capabilities is crucial for controlling the privileges available to containers. Docker, by default, runs with a restricted set of capabilities. You can enhance security by dropping unnecessary capabilities and adding only the ones required. docker run --cap-drop all --cap-add CHOWN node-app Let’s take our simple Hello World React containerized app and see how it can fit into the example practices for least privilege in Docker and integrate this application with least privilege practices: FROM node:21.6-alpine3.18 WORKDIR /app COPY package*.json ./ RUN npm install COPY . . EXPOSE 3000 # Drop unnecessary capabilities CMD ["--cap-drop", "all", "--cap-add", "CHOWN", "npm", "start"] Add –no-new-privileges flag Running containers with the --security-opt=no-new-privileges flag is essential to prevent privilege escalation through setuid or setgid binaries. The setuid and setgid binaries allow users to run an executable with the file system permissions of the executable’s owner or group, respectively, and to change behavior in directories. This flag ensures that the container’s privileges cannot be escalated during runtime. docker run --security-opt=no-new-privileges node-app Disable inter-container communication Inter-container communication (icc) is enabled by default in Docker, allowing containers to communicate using the docker0 bridged network. docker0 bridges your container’s network (or any Compose networks) to the host’s main network interface, meaning your containers can access the network and you can access the containers. Disabling icc enhances security, requiring explicit communication definitions with --link options. docker run --icc=false node-app Use Linux Security Modules When you’re running applications in Docker containers, you want to make sure they’re as secure as possible. One way to do this is by using Linux Security Modules (LSMs), such as seccomp, AppArmor, or SELinux. These tools can provide additional layers of protection for Linux systems and containerized applications by controlling which actions a container can perform on the host system: Seccomp is a Linux kernel feature that allows a process to make a one-way transition into a “secure” state where it’s restricted to a reduced set of system calls. It restricts the system calls that a process can make, reducing its attack surface and potential impact if compromised. AppArmor confines individual programs to predefined rules, specifying their allowed behavior and limiting access to files and resources. SELinux enforces mandatory access control policies, defining rules for interactions between processes and system resources to mitigate the risk of privilege escalation and enforce least privilege principles. By leveraging these LSMs, administrators can enhance the security posture of their systems and applications, safeguarding against various threats and vulnerabilities. For instance, when considering a simple Hello World React application containerized within Docker, you may opt to employ the default seccomp profile unless overridden with the --security-opt option. This flexibility enables administrators to explicitly define security policies based on their specific requirements, as demonstrated in the following command: docker run --rm -it --security-opt seccomp=/path/to/seccomp/profile.json node-app Customize seccomp profiles Customizing seccomp profiles at runtime offers several benefits: Flexibility: By separating the seccomp configuration from the Dockerfile, you can adjust the security settings without modifying the image itself. This approach allows for easier experimentation and iteration. Granular control: Custom seccomp profiles let you precisely define which system calls are permitted or denied within your containers. This level of granularity allows you to tailor the security settings to the specific requirements of your application. Security compliance: In environments with strict security requirements, custom seccomp profiles can help ensure compliance by enforcing tighter restrictions on containerized processes. Limit container resources In Docker, containers are granted flexibility to consume CPU and RAM resources up to the extent allowed by the host kernel scheduler. While this flexibility facilitates efficient resource utilization, it also introduces potential risks: Security breaches: In the unfortunate event of a container compromise, attackers could exploit its unrestricted access to host resources for malicious activities. For instance, a compromised container could be exploited to mine cryptocurrency or execute other nefarious actions. Performance bottlenecks: Resource-intensive containers have the potential to monopolize system resources, leading to performance degradation or service outages across your applications. To mitigate these risks effectively, it’s crucial to establish clear resource limits for your containers: Allocate resources wisely: Assign specific amounts of CPU and RAM to each container to ensure fair distribution and prevent resource dominance. Enforce boundaries: Set hard limits that containers cannot exceed, effectively containing potential damage and thwarting resource exhaustion attacks. Promote harmony: Efficient resource management ensures stability, allowing containers to operate smoothly and fulfill their tasks without contention. For example, to limit CPU usage, you can run the container with: -docker run -it --cpus=".5" node-app This command limits the container to use only 50% of a single CPU core. Remember, setting resource limits isn’t just about efficiency — it’s a vital security measure that safeguards your host system and promotes harmony among your containerized applications. To prevent potential denial-of-service (DoS) attacks, limiting resources such as memory, CPU, file descriptors, and processes is crucial. Docker provides mechanisms to set these limits for individual containers. --restart=on-failure:<number_of_restarts> --ulimit nofile=<number> --ulimit nproc=<number> By diligently adhering to these least privilege principles, you can establish a robust security posture for your Docker containers. 6. Choose the right base image Finding the right image can seem daunting with more than 8.3 million repositories on Docker Hub. Two beacons can help guide you toward safe waters: Docker Official Images (DOI) and Docker Verified Publisher (DVP) badges. Docker Official Images (marked by a blue badge shield) offer a curated set of open source and drop-in solution repositories. These are your go-to for common bases like Ubuntu, Python, or Nginx. Imagine them as trusty ships, built with quality materials and regularly inspected for seaworthiness. Docker Verified Publisher Images (signified by a gold check mark) are like trusted partners, organizations who have teamed up with Docker to offer high-quality images. Docker verifies the authenticity and security of their content, giving you extra peace of mind. Think of them as sleek yachts, built by experienced shipwrights and certified by maritime authorities. Remember that Docker Official Images are a great starting point for common needs, and Verified Publisher images offer an extra layer of trust and security for crucial projects. Conclusion Optimizing Docker images for security involves a multifaceted approach, addressing image size, access controls, and vulnerability management. By understanding Docker image layering and leveraging practices such as choosing minimal base images and employing multi-stage builds, developers can significantly enhance efficiency and security. Running applications with least privileges, monitoring vulnerabilities with tools like Docker Scout, and implementing content trust further fortify the containerized ecosystem. As the Docker landscape evolves, staying informed about best practices and adopting proactive security measures is paramount. This guide serves as a valuable resource, empowering developers and operators to navigate the seas of Docker security with confidence and ensuring their applications are not only functional but also resilient to potential threats. Learn more Subscribe to the Docker Newsletter. Get the latest release of Docker Desktop. Get started with Docker Scout. Vote on what’s next! Check out our public roadmap. Have questions? The Docker community is here to help. New to Docker? Get started. View the full article
  17. Introduction Today customers want to reduce manual operations for deploying and maintaining their infrastructure. The recommended method to deploy and manage infrastructure on AWS is to follow Infrastructure-As-Code (IaC) model using tools like AWS CloudFormation, AWS Cloud Development Kit (AWS CDK) or Terraform. One of the critical components in terraform is managing the state file which keeps track of your configuration and resources. When you run terraform in an AWS CI/CD pipeline the state file has to be stored in a secured, common path to which the pipeline has access to. You need a mechanism to lock it when multiple developers in the team want to access it at the same time. In this blog post, we will explain how to manage terraform state files in AWS, best practices on configuring them in AWS and an example of how you can manage it efficiently in your Continuous Integration pipeline in AWS when used with AWS Developer Tools such as AWS CodeCommit and AWS CodeBuild. This blog post assumes you have a basic knowledge of terraform, AWS Developer Tools and AWS CI/CD pipeline. Let’s dive in! Challenges with handling state files By default, the state file is stored locally where terraform runs, which is not a problem if you are a single developer working on the deployment. However if not, it is not ideal to store state files locally as you may run into following problems: When working in teams or collaborative environments, multiple people need access to the state file Data in the state file is stored in plain text which may contain secrets or sensitive information Local files can get lost, corrupted, or deleted Best practices for handling state files The recommended practice for managing state files is to use terraform’s built-in support for remote backends. These are: Remote backend on Amazon Simple Storage Service (Amazon S3): You can configure terraform to store state files in an Amazon S3 bucket which provides a durable and scalable storage solution. Storing on Amazon S3 also enables collaboration that allows you to share state file with others. Remote backend on Amazon S3 with Amazon DynamoDB: In addition to using an Amazon S3 bucket for managing the files, you can use an Amazon DynamoDB table to lock the state file. This will allow only one person to modify a particular state file at any given time. It will help to avoid conflicts and enable safe concurrent access to the state file. There are other options available as well such as remote backend on terraform cloud and third party backends. Ultimately, the best method for managing terraform state files on AWS will depend on your specific requirements. When deploying terraform on AWS, the preferred choice of managing state is using Amazon S3 with Amazon DynamoDB. AWS configurations for managing state files Create an Amazon S3 bucket using terraform. Implement security measures for Amazon S3 bucket by creating an AWS Identity and Access Management (AWS IAM) policy or Amazon S3 Bucket Policy. Thus you can restrict access, configure object versioning for data protection and recovery, and enable AES256 encryption with SSE-KMS for encryption control. Next create an Amazon DynamoDB table using terraform with Primary key set to LockID. You can also set any additional configuration options such as read/write capacity units. Once the table is created, you will configure the terraform backend to use it for state locking by specifying the table name in the terraform block of your configuration. For a single AWS account with multiple environments and projects, you can use a single Amazon S3 bucket. If you have multiple applications in multiple environments across multiple AWS accounts, you can create one Amazon S3 bucket for each account. In that Amazon S3 bucket, you can create appropriate folders for each environment, storing project state files with specific prefixes. Now that you know how to handle terraform state files on AWS, let’s look at an example of how you can configure them in a Continuous Integration pipeline in AWS. Architecture Figure 1: Example architecture on how to use terraform in an AWS CI pipeline This diagram outlines the workflow implemented in this blog: The AWS CodeCommit repository contains the application code The AWS CodeBuild job contains the buildspec files and references the source code in AWS CodeCommit The AWS Lambda function contains the application code created after running terraform apply Amazon S3 contains the state file created after running terraform apply. Amazon DynamoDB locks the state file present in Amazon S3 Implementation Pre-requisites Before you begin, you must complete the following prerequisites: Install the latest version of AWS Command Line Interface (AWS CLI) Install terraform latest version Install latest Git version and setup git-remote-codecommit Use an existing AWS account or create a new one Use AWS IAM role with role profile, role permissions, role trust relationship and user permissions to access your AWS account via local terminal Setting up the environment You need an AWS access key ID and secret access key to configure AWS CLI. To learn more about configuring the AWS CLI, follow these instructions. Clone the repo for complete example: git clone https://github.com/aws-samples/manage-terraform-statefiles-in-aws-pipeline After cloning, you could see the following folder structure: Figure 2: AWS CodeCommit repository structure Let’s break down the terraform code into 2 parts – one for preparing the infrastructure and another for preparing the application. Preparing the Infrastructure The main.tf file is the core component that does below: It creates an Amazon S3 bucket to store the state file. We configure bucket ACL, bucket versioning and encryption so that the state file is secure. It creates an Amazon DynamoDB table which will be used to lock the state file. It creates two AWS CodeBuild projects, one for ‘terraform plan’ and another for ‘terraform apply’. Note – It also has the code block (commented out by default) to create AWS Lambda which you will use at a later stage. AWS CodeBuild projects should be able to access Amazon S3, Amazon DynamoDB, AWS CodeCommit and AWS Lambda. So, the AWS IAM role with appropriate permissions required to access these resources are created via iam.tf file. Next you will find two buildspec files named buildspec-plan.yaml and buildspec-apply.yaml that will execute terraform commands – terraform plan and terraform apply respectively. Modify AWS region in the provider.tf file. Update Amazon S3 bucket name, Amazon DynamoDB table name, AWS CodeBuild compute types, AWS Lambda role and policy names to required values using variable.tf file. You can also use this file to easily customize parameters for different environments. With this, the infrastructure setup is complete. You can use your local terminal and execute below commands in the same order to deploy the above-mentioned resources in your AWS account. terraform init terraform validate terraform plan terraform apply Once the apply is successful and all the above resources have been successfully deployed in your AWS account, proceed with deploying your application. Preparing the Application In the cloned repository, use the backend.tf file to create your own Amazon S3 backend to store the state file. By default, it will have below values. You can override them with your required values. bucket = "tfbackend-bucket" key = "terraform.tfstate" region = "eu-central-1" The repository has sample python code stored in main.py that returns a simple message when invoked. In the main.tf file, you can find the below block of code to create and deploy the Lambda function that uses the main.py code (uncomment these code blocks). data "archive_file" "lambda_archive_file" { …… } resource "aws_lambda_function" "lambda" { …… } Now you can deploy the application using AWS CodeBuild instead of running terraform commands locally which is the whole point and advantage of using AWS CodeBuild. Run the two AWS CodeBuild projects to execute terraform plan and terraform apply again. Once successful, you can verify your deployment by testing the code in AWS Lambda. To test a lambda function (console): Open AWS Lambda console and select your function “tf-codebuild” In the navigation pane, in Code section, click Test to create a test event Provide your required name, for example “test-lambda” Accept default values and click Save Click Test again to trigger your test event “test-lambda” It should return the sample message you provided in your main.py file. In the default case, it will display “Hello from AWS Lambda !” message as shown below. Figure 3: Sample Amazon Lambda function response To verify your state file, go to Amazon S3 console and select the backend bucket created (tfbackend-bucket). It will contain your state file. Figure 4: Amazon S3 bucket with terraform state file Open Amazon DynamoDB console and check your table tfstate-lock and it will have an entry with LockID. Figure 5: Amazon DynamoDB table with LockID Thus, you have securely stored and locked your terraform state file using terraform backend in a Continuous Integration pipeline. Cleanup To delete all the resources created as part of the repository, run the below command from your terminal. terraform destroy Conclusion In this blog post, we explored the fundamentals of terraform state files, discussed best practices for their secure storage within AWS environments and also mechanisms for locking these files to prevent unauthorized team access. And finally, we showed you an example of how efficiently you can manage them in a Continuous Integration pipeline in AWS. You can apply the same methodology to manage state files in a Continuous Delivery pipeline in AWS. For more information, see CI/CD pipeline on AWS, Terraform backends types, Purpose of terraform state. Arun Kumar Selvaraj Arun Kumar Selvaraj is a Cloud Infrastructure Architect with AWS Professional Services. He loves building world class capability that provides thought leadership, operating standards and platform to deliver accelerated migration and development paths for his customers. His interests include Migration, CCoE, IaC, Python, DevOps, Containers and Networking. Manasi Bhutada Manasi Bhutada is an ISV Solutions Architect based in the Netherlands. She helps customers design and implement well architected solutions in AWS that address their business problems. She is passionate about data analytics and networking. Beyond work she enjoys experimenting with food, playing pickleball, and diving into fun board games. View the full article
  18. Learn about passphrases and understand how you can use these strong yet memorable phrases to safeguard your accounts against hackers.View the full article
  19. In this TechRepublic exclusive, Forrester's CIO advises that the key to evaluating emerging technology is tying it to your organization's core business strategy.View the full article
  20. Companies worldwide are committed to reducing their IT carbon footprint, championing a more sustainable future through initiatives focused on efficiency and cost optimization. Cloud sustainability is not only about reducing the environmental impact of cloud usage, but also about making smart business decisions that align to corporate values, adhere to regulatory requirements, and enable the pursuit of long-term business goals. To understand the impact of cloud computing on carbon emissions, precise measurement, trustworthy data, and robust tools are essential. That’s why we’re excited to announce two new capabilities to optimize your Microsoft Azure emissions: Azure Carbon Optimization (preview) is a free, cutting-edge capability that empowers Azure developers and IT professionals to understand and optimize emissions stemming from Azure usage. By providing insights into carbon emissions and offering recommendations for enhancing cloud efficiency, this tool aligns with the Microsoft commitment to environmental responsibility and supports you in achieving your cloud sustainability goals. Microsoft Azure emissions insights (preview) in sustainability data solutions in Microsoft Fabric enables you to unify and analyze emissions data for Azure usage. By having access to your Azure emissions data in Microsoft Fabric, you can query and drill down into Azure resource level emissions for advanced reporting and analysis.  Both tools offer a holistic solution for organizations aiming to reduce their carbon footprint by optimizing specific resources or workloads within Azure. With Azure Carbon Optimization (preview), engineering and IT teams can use ready-to-consume insights and recommendations for optimizing their carbon emissions, all within the Azure portal. Microsoft Azure emissions insights (preview) enable data analysts and engineers to dive deeper into emissions data, allowing them to slice and dice the data and perform deeper analytics using Microsoft Fabric. Once your organization can access insights into the carbon emissions generated at the resource or workload level, reduction efforts can begin. This involves optimizing cloud systems for efficiency to benefit the environment and enhance overall performance. Azure administrators can already see a company-wide view of cloud emissions in the Emissions Impact Dashboard. To optimize your carbon footprint, you can take advantage of more granular insights into carbon emissions originating from specific resources or workloads. Like any major organizational shift, reducing carbon emissions requires contributions from every corner of your company. In this blog, we will not only explore the benefits of Azure Carbon Optimization and Microsoft Azure emissions insights, but also how the FinOps framework can guide your business through the complexities of carbon emission reduction to help achieve both your environmental and financial goals. Align IT sustainability with ESG regulations Organizations around the world are setting carbon neutrality goals for themselves, which are furthered by new environmental regulations and standards introduced by global government and regulatory bodies, with a significant driver being environmental, social, and governance (ESG) regulations. These governmental standards dictate ESG-related actions, reporting, and disclosures. Microsoft provides offerings to help customers with their ESG reporting needs with tools and products available with Microsoft Cloud for Sustainability to help your organization collect and manage more ESG data and get fuller visibility into your environmental impact. Our goal is to help prepare you for any new reporting requirements by compiling a comprehensive ESG data estate. IT sustainability plays a pivotal role in a company’s ESG management strategy because it serves as a cornerstone for mitigating environmental impact, ensuring responsible cloud usage, and reinforcing the overall commitment to sustainable development practices. There are also direct economic benefits for reducing carbon emissions, such as long-term operational cost savings. Above all, organizations that proactively address environmental issues and reduce their carbon footprint will be better positioned for long-term success, especially in a business landscape where sustainability is increasingly important. Measure and reduce your emissions with Azure Carbon Optimization Our free Azure Carbon Optimization tool, now in public preview and accessible through your Azure portal, is a window into your cloud resources emissions, ultimately leading to recommendations on how to cut back. It empowers Azure users to closely monitor and optimize their carbon footprint. Azure Carbon Optimization is designed to provide everyone in your organization, from developers, to architects, to IT professionals, with a resource-level view of emissions data. This empowers your engineers to take proactive measures to mitigate emissions and track progress right from the Azure portal. Azure Carbon Optimization uses the same carbon accounting methodology as the Emissions Impact Dashboard. Developers can work towards maximizing resource utilization while minimizing carbon emissions from the cloud, helping ensure that every deployed resource serves a purpose, eliminates waste, and reduces environmental impact. The tool also presents carbon emission reduction in equivalent terms that are easy for anyone to understand. Subsequently, it provides developers with carbon savings recommendations that are based on analyzing resource utilization. Suggestions include deleting or resizing underutilized resources. With these ready-to-consume recommendations, you can optimize your Azure usage, avoid carbon emissions, and promote sustainable development practices. This way, you not only enhance your environmental performance, but also achieve cost savings and efficiency. Perform even deeper Azure emissions analysis with Microsoft Fabric Microsoft Azure emissions insights, now in public preview, is a part of the sustainability data solutions in Microsoft Fabric. It helps unify, process, query, and perform deeper analysis of Azure emissions data. In addition to emissions data and related pipelines, Power BI dashboards are provided with Microsoft Azure emissions insights to drill-down and compare emissions data across subscriptions and resources. This helps IT administrators identify patterns in Azure emissions that evolve with time and change with Azure resource usage. Unified Azure emissions data empowers data analysts to enrich the emissions data with custom information such as department using subscriptions and resources. They can then query the data and build analytic models for interesting insights such as Azure emissions by departments and seasonality of emissions by usage. Leverage FinOps best practices to help optimize carbon emissions Fostering a culture of accountability, efficiency, and governance across an organization stands as a key objective within the FinOps framework, which aims to help organizations optimize their cloud to maximize business value. Efficiency has a positive impact on innovation by freeing up resources and allowing organizations to invest more in modernization, research, and development. FinOps supports the customer journey by establishing a cross-functional team that includes finance, IT, engineers, and business leaders to create a culture of accountability where everyone takes ownership of their cloud usage. As ESG regulations compel adherence to complex emissions reporting requirements, integrating FinOps best practices can help teams to better manage and optimize carbon emissions. When viewed through the lens of environmental awareness, FinOps can assist with best practices that foster accountability, efficiency, and governance to enable data-driven decisions. Leveraging these best practices in tandem with Azure Carbon Optimization and Microsoft Azure emissions insights empowers your organization to be a catalyst for change, transforming cloud practices into a force for sustainability by helping track, analyze, and optimize emissions towards a greener, more responsible cloud ecosystem. Reach your sustainability goals with data-driven Azure insights By employing these capabilities and adhering to FinOps practices, your organization can actively track, assess, and mitigate your carbon emissions. You’ll not only gain a detailed understanding of the emissions impact associated with your Azure resources, but also valuable insight into your compliance posture for any coming ESG regulations. Next steps Visit the Azure Carbon Optimization documentation and our new learning collection to discover more about how to start leveraging the data-driven insights provided by Azure Carbon Optimization for a more environmentally responsible and efficient operation. Continue your sustainability journey with the Azure Well-Architected Framework sustainability guidance and explore Sustainability outcomes and benefits for business through the Cloud Adoption Framework. This guidance provides insights into end-to-end sustainability considerations in your cloud estate. Visit the documentation for Microsoft Azure emissions insights and this new blog to learn more about deploying it in your Fabric environment and get started with centralizing and analyzing your Azure emissions data. This capability can be leveraged to analyze the trends of your Azure emissions over time by subscriptions and resources. For more on how FinOps best practices can help you maximize your cloud business value while addressing the complexities of carbon emission reduction, explore Microsoft’s resources for FinOps: Assess your organization’s gaps using the Microsoft FinOps Review Assessment. Gain hands-on experience with Microsoft solutions that empower FinOps through the Microsoft FinOps Interactive Guides. Explore a range of related Microsoft products and services on the FinOps on Azure homepage. Visit the Azure Carbon Optimization documentation Start leveraging data-driven insights and reduce your emissions today Learn more The post Achieving sustainable growth with Azure and FinOps best practices appeared first on Microsoft Azure Blog. View the full article
  21. In Cloudera deployments on public cloud, one of the key configuration elements is the DNS. Get it wrong and your deployment may become wholly unusable with users unable to access and use the Cloudera data services. If the DNS is set up less ideal than it could be, connectivity and performance issues may arise. In this blog, we’ll take you through our tried and tested best practices for setting up your DNS for use with Cloudera on Azure. To get started and give you a feel for the dependencies for the DNS, in an Azure deployment for Cloudera, these are the Azure managed services being used: AKS cluster: data warehouse, data engineering, machine learning, and Data flow MySQL database: data engineering Storage account: all services Azure database for PostgreSQL DB: data lake and data hub clusters Key vault: all services Typical customer governance restrictions and the impact Most Azure users use private networks with a firewall as egress control. Most users have restrictions on firewalls for wildcard rules. Cloudera resources are created on the fly, which means wildcard rules may be declined by the security team. Most Azure users use hub-spoke network topology. DNS servers are usually deployed in the hub virtual network or an on-prem data center instead of in the Cloudera VNET. That means if DNS is not configured correctly, the deployment will fail. Most Cloudera customers deploying on Azure allow the use of service endpoints; there is a smaller set of organizations that do not allow the use of service endpoints. Service endpoint is a simpler implementation to allow resources on a private network to access managed services on Azure Cloud. If service endpoints are not allowed, firewall and private endpoints will be the other two options. Most cloud users do not like opening firewall rules because that will introduce the risk of exposing private data on the internet. That leaves private endpoints the only option, which will also introduce additional DNS configuration for the private endpoints. Connectivity from private network to Azure managed services Firewall to Internet Route from firewall to Azure managed service endpoint on the internet directly. Service endpoint Azure provides service endpoints for resources on private networks to access the managed services on the internet without going through the firewall. That can be configured at a subnet level. Since Cloudera resources are deployed in different subnets, this configuration must be enabled on all subnets. The DNS records of the managed services using service endpoints will be on the internet and managed by Microsoft. The IP address of this service will be a public IP, and routable from the subnet. Please refer to the Microsoft documentation for detail. Not all managed services support services endpoint. In a Cloudera deployment scenario, only storage accounts, PostgreSQL DB, and Key Vault support service endpoints. Fortunately, most users allow service endpoints. If a customer doesn’t allow service endpoints, they have to go with a private endpoint, which is similar to what needs to be configured in the following content. Private Endpoint There is a network interface with a private IP address created with a private endpoint, and there is a private link service associated with a specific network interface, so that other resources in the private network can access this service through the private network IP address. The key here is for the private resources to find a DNS resolve for that private IP address. There are two options to store the DNS record: Azure managed public DNS zones will always be there, but they store different types of IP addresses for the private endpoint. For example: Storage account private endpoint—the public DNS zone stores the public IP address of that service. AKS API server private endpoint—the public DNS zone stores the private IP of that service. Azure Private DNS zone: The DNS records will be synchronized to the Azure Default DNS of LINKED VNET. Private endpoint is eligible to all Azure managed services that are used in Cloudera deployments. As a consequence, for storage accounts, users either use service endpoints or private endpoints. Because the public DNS zone will always return a public IP, the private DNS zone becomes a mandatory configuration. For AKS, these two DNS alternatives are both suitable. The challenges of private DNS zones will be discussed next. Challenges of private DNS zone on Azure private network Important Assumptions As mentioned above for the typical scenario, most Azure users are using a hub-and-spoke network architecture, and deploy custom private DNS on hub VNET. The DNS records will be synchronized to Azure default DNS of linked VNET. Simple Architecture Use Cases One VNET scenario with private DNS zone: When a private endpoint is created, Cloudera on Azure will register the private endpoint to the private DNS zone. The DNS record will be synchronized to Azure Default DNS of linked VNET. If users use custom private DNS, they can configure conditional forward to Azure Default DNS for the domain suffix of the FQDN. Hub-and-spoke VNET with Azure default DNS: With hub-spoke VNET and Azure default DNS, that is still acceptable. The only problem is that the resources on the un-linked VNET will not be able to access the AKS. But since AKS is used by Cloudera, that does not pose any major issues. The Challenge Part The most popular network architecture among Azure consumers is hub-spoke network with custom private DNS servers deployed either on hub-VNET or on-premises network. Since DNS records are not synchronized to the Azure Default DNS of the hub VNET, the custom private DNS server cannot find the DNS record for the private endpoint. And because the Cloudera VNET is using the custom private DNS server on hub VNET, the Cloudera resources on Cloudera VNET will go to a custom private DNS server for DNS resolution of the FQDN of the private endpoint. The provisioning will fail. With the DNS server deployed in the on-prem network, there isn’t Azure default DNS associated with the on-prem network, so the DNS server couldn’t find the DNS record of the FQDN of the private endpoint. Configuration best practices Against the background Option 1: Disable Private DNS Zone Use Azure managed public DNS zone instead of a private DNS zone. For data warehouse: create data warehouses through the Cloudera command line interface with the parameter “privateDNSZoneAKS”: set to”None.” For Liftie-based data services: the entitlement “LIFTIE_AKS_DISABLE_PRIVATE_DNS_ZONE” must be set. Customers can request this entitlement to be set either through a JIRA ticket or have their Cloudera solution engineer to make the request on their behalf. The sole drawback of this option is that it does not apply to data engineering, since that data service will create and use a MySQL private DNS zone on the fly. There is at present no option to disable private DNS zones for data engineering. Option 2: Pre-create Private DNS Zones Pre-create private DNS zones and link both Cloudera and hub VNETs to them. The advantage of this approach is that both data warehouse and Liftie-based data services support pre-created private DNS zones. There are however also a few drawbacks: For Liftie, the private DNS zone needs to be configured when registering the environment. Once past the environment registration stage, it cannot be configured. DE will need a private DNS zone for MySQL and it doesn’t support pre-configured private DNS zones. On-premises networks can’t be linked to a private DNS zone. If the DNS server is on an on-prem network, there are no workable solutions. Option 3: Create DNS Server as a Forwarder. Create a couple of DNS servers (for HA consideration) with load balancer in Cloudera VNET, and configure conditional forward to Azure Default DNS of the Cloudera VNET. Configure conditional forward from the company custom private DNS server to the DNS server in the Cloudera subnet. The drawback of this option is that additional DNS servers are required, which leads to additional administration overhead for the DNS team. Option 4: Azure-Managed DNS Resolve Create a dedicated /28 subnet in Cloudera VNET for Azure private DNS resolver inbound endpoint. Configure conditional forward from custom private DNS to the Azure private DNS resolver inbound endpoint. Summary Bringing all things together, consider these best practices for setting up your DNS with Cloudera on Azure: For the storage account, key vault, postgres DB Use service endpoints as the first choice. If service endpoint is not allowed, pre-create private DNS zones and link to the VNET where the DNS server is deployed. Configure conditional forwards from custom private DNS to Azure default DNS. If the custom private DNS is deployed in the on-premises network, use Azure DNS resolver or another DNS server as DNS forwarder on the Cloudera VNET. Conditional forward the DNS lookup from the private DNS to the resolver endpoint. For the data warehouse, DataFlow, or machine learning data services Disable the private DNS zone and use the public DNS zone instead. For the data engineering data service Configure the Azure DNS resolver or another DNS server as a DNS forwarder on the Cloudera VNET. Conditional forward the DNS lookup from the private DNS to the resolver endpoint. Please refer to Microsoft documentation for the details of setting up an Azure DNS Private Resolver. For more background reading on network and DNS specifics for Azure, have a look at our documentation for the various data services: DataFlow, Data Engineering, Data Warehouse, and Machine Learning. We’re also happy to discuss your specific needs; in that case please reach out to your Cloudera account manager or get in touch. The post DNS Zone Setup Best Practices on Azure appeared first on Cloudera Blog. View the full article
  22. Technical Architecture First, let's turn to the architecture, which will be explained in detail. Let's look at each of these tiers in detail. Let me explain the architecture in detail. These components are commonly associated with the architecture of applications that follow the principles of Domain-Driven Design (DDD) and Model-View-Controller (MVC) or similar architectural patterns. Let me cover this one by one: View the full article
  23. Hello Everyone I am currently working on integrating automated rollbacks into our CI/CD pipeline to ensure a more robust deployment process. Our team is looking to find the best methods and tools that can be adopted to make this transition as smooth as possible. I came across this article- https://docs.aws.amazon.com/codedeploy/devops traininglatest/userguide/deployments-rollback-and-redeploy.html ,that provided some insights, but I'd love to hear from those of you who have hands-on experience in this area: What strategies have you implemented for automated rollbacks? How do you handle rollback complexities, especially when dealing with dependencies? Are there any specific tools or platforms that you recommend? What lessons have you learned and what pitfalls should we be aware of? Your real-world experiences will supplement what we've learned from the literature and help us make more informed decisions. Thanks in advance!
  24. As container adoption continues to surge and cyber-attacks become more sophisticated, securing containerized applications has become a critical task. This is a complex task that involves securing the containerized application and the different tools interacting with the container. However, with a bit of forethought and the right measures in place, you can secure the entire container lifecycle. In this article, we'll cover some practical security best practices you can implement today to help secure your container infrastructure. Why Container Security MattersAccording to a recent report by Gartner, more than 70% of global organizations were running more than two containerized applications in 2023, up from less than 25% in 2020. This means that container security is not only a technical issue but also a business and compliance issue. If your containers are compromised, you could face data breaches, service disruptions, reputation damage, and legal liabilities. Therefore, it's essential to embrace a DevSecOps approach, where security is embedded into every phase of the development, deployment, and runtime cycle. To learn more about container security in general, check out our article Security & Containerization - How to Secure Containers Container SecurityTo fully secure your containerized workloads in production, you have to secure the container’s host, image, runtime, registry, and orchestrator. Host and OS SecurityThe first step to secure your containers is to secure the underlying host and operating system that runs them. The host and OS provide the foundation for your container stack, and any weakness in them can compromise the entire stack. Here are some of the best practices for host and OS security: Use a minimal and dedicated OS for your containers. A minimal OS has fewer packages and features, which reduces the attack surface and the need for patching. A dedicated OS is optimized for running containers and has no other applications or services that could interfere with them. Some examples of minimal and dedicated OSes for containers are CoreOS, RancherOS, and Ubuntu Core.Keep your host and OS up to date. Regularly apply security patches and updates to your host and OS to fix any known vulnerabilities and bugs. You can use tools like Ansible or Chef to automate the update process and ensure consistency across your hosts.Harden your host and OS configuration. Apply security best practices such as disabling unnecessary services and ports, enforcing strong passwords and encryption, limiting user access and privileges, and enabling firewall and antivirus protection. Tools like CIS Benchmarks or OpenSCAP can help you audit and enforce your configuration settings.‬Monitor your host and OS activity. Use Prometheus or Grafana to collect and visualize metrics such as CPU, memory, disk, network, and process usage. Use tools such as Falco or Auditd to detect and alert you on any suspicious or anomalous activity, for instance, unauthorized access, file changes, or system calls.Container Image SecurityThe next step to secure your containers is to secure the images used to build them. Here are some of the best practices for container image security: Use trusted and verified images. Use images from official repositories or vendors, and verify their authenticity and integrity using digital signatures and checksums. Avoid using images from unknown or untrusted sources, as they may contain malicious code or vulnerabilities.Scan your images for vulnerabilities. Scan your images with Clair or Trivy for any known vulnerabilities in the application code, libraries, or packages. Fix any critical or high-severity vulnerabilities before deploying your images to production. Use Anchore or Snyk to monitor and update your images for any new vulnerabilities.Minimize your image size and layers. Use tools like Dockerfile or BuildKit to create efficiently sized images. Minimizing your image size and layers can improve your image performance, portability, and security, as well as reduce the attack surface and the exposure time. Learn more about layers from this blog: What Are Docker Image Layers and How Do They Work?Encrypt and protect your image data. Encrypt your image data with strong encryption algorithms and keys using tools like Docker or Podman. This protects your image content from unauthorized access or tampering. Use Vault or Sealed Secrets to secure and manage your image secrets, such as passwords, tokens, or certificates.‬ Container Runtime SecurityThe third step to secure your containers involves securing the runtime that runs them. Here are some of the best practices for container runtime security: Use a secure and compatible runtime. Use a runtime that supports the security features and standards that you need for your containers, such as namespaces, cgroups, seccomp, apparmor, SELinux, or OCI. Some examples of secure and compatible runtimes are Docker, CRI-O, and containerd.Limit your container privileges and resources. Use tools like Docker or Kubernetes to limit the privileges and resources such as the network, filesystem, memory, CPU, or devices that your containers can access and use. This can prevent privilege escalation, resource exhaustion, or denial-of-service attacks. You can also use tools like gVisor or Kata Containers to isolate your containers using sandboxing techniques, such as virtualization or user-space kernels.Container Registry SecurityThe fourth step is to secure the container registry storing your images. Here are some of the best practices for container registry security: Use a private and secure registry. Use a registry that offers the security and compliance features that you need for your containers, such as encryption, authentication, authorization, auditing, or scanning. Some examples of private and secure registries include Docker Hub, Harbor, and Quay.Control your registry access and permissions. Implement authorization to control who has access to your registry and what they can do with the images. Additionally, implement multi-factor authentication to provide an extra layer of security for your registry.Backup and replicate your registry data. Use backup tools like Velero or Rsync to backup and restore your registry data in case of data loss, corruption, or disaster. Also, use application tools like Skopeo or Crane to replicate your registry data across different regions or zones for high availability and faster recovery in case of a successful attack.Container Orchestrator SecurityThe fifth and final step to secure your containers is to secure the container orchestrator. You do that by following these best practices: Use a mature and reliable orchestrator. Use an orchestrator that has a proven track record of stability, performance, and security. Examples of such orchestrators include Kubernetes, Docker Swarm, and Mesos. Use the latest stable version of the orchestrator and keep it updated with the latest security patches and bug fixes.Harden your orchestrator’s security. Apply security best practices to your orchestrator configuration, such as enabling TLS encryption, RBAC authorization, network policies, Pod security policies, and admission controllers. You can use tools like CIS Benchmarks or OpenSCAP to audit and enforce your configuration settings.If you use Kubernetes as your container orchestrator, you should check out the following articles: Kubernetes Security Best PracticesCertified Kubernetes Administrator Exam Series (Part-6): SecurityTake the first step to certify your container security skills by enrolling in our Certified Kubernetes Security Specialist (CKS). ConclusionContainer security should be a top priority as attacks become more sophisticated. You need to secure every layer of your container stack, from the host OS to the orchestrator, using the best practices we discussed. By doing so, you can enjoy the benefits of containers without compromising the security of your applications and data. If you have any questions or feedback, please leave a comment below. If you're interested in learning more about DevOps, you can sign up for a free account on KodeKloud. As a member, you’ll have access to 70+ courses, labs, quizzes, and projects, that will help you master different DevOps skills. View the full article
  25. Editor’s note: In the previous post in this serieswe introduced primary key default values in Spanner. We also showed how to use the UUIDs and integer SEQUENCEs to automatically generate keys in the database. In this post we’ll show how to use these new capabilities to migrate schemas and data from other databases to Spanner, minimizing changes to downstream applications and ensuring Spanner best practices. Spanner is a distributed relational database that is designed for the highest levels of availability and consistency at any scale. It allows users to seamlessly scale resources up and down to optimize costs based on their real-time needs, while maintaining continuous operation. Customers in gaming, retail, financial services, and many other industries rely on Spanner for their most demanding workloads. Migrating to SpannerMany of these workloads did not start on Spanner, though. Customers come to Spanner from different relational and non-relational databases, looking to take advantage of Spanner’s seamless scaling and fully managed experience. Spanner provides a set of tools and best practices to facilitate migrations. The Spanner Migration Tools include assessment, schema translation, and data movement for terabyte-sized databases coming from MySQL and PostgreSQL. For broader migration guidance, you can refer to the Spanner documentation. In this post we’ll focus specifically on migrating databases that use auto-generated keys, in particular, auto-incrementing sequential integers and UUIDs. Each of the migration strategies below addresses the key requirements: Ensure the fidelity and correctness of the migrated keysMinimize downstream application changes, such as changing types or values of the keys themselvesSupport replication scenarios where either the source or target database generates the keys and data is synchronized between them, for example, to do a live cutover between systemsImplement Spanner best practices for performance and scalabilityMigrating sequential keysWe’ll start with the most common scenario for relational workloads coming to Spanner: migrating from a single-instance database that uses sequential monotonic keys, for example AUTO_INCREMENT in MySQL, SERIAL in PostgreSQL, or the standard IDENTITY type in SQL Server or Oracle. For databases that manage writes on a single machine, a counter to provide sequential keys is simple. However, ordered keys can cause performance hotspots in a distributed system like Spanner. At a high level, the strategy to migrate sequential keys to Spanner is: Define a copy of the table in Spanner using an integer primary key, just as in the source database.Create a sequence in Spanner and set the table’s primary key to use it for its default value.Load the data with its keys as-is from the source database into Spanner, for example using the Spanner Migration Tool or the lower-level Dataflow templates.Optionally set foreign key constraints for any dependent tables.Before inserting new data, configure the Spanner sequence to skip values in the range of the existing keys.Insert new data, as before, allowing the sequence to generate keys by default.Let’s start by defining the table and related sequence. In Spanner you define a new SEQUENCE object and set it as the default primary value of the destination table, for example using the GoogleSQL dialect: code_block<ListValue: [StructValue([('code', "CREATE SEQUENCE singer_id_sequence OPTIONS (\r\n sequence_kind = 'bit_reversed_positive’\r\n);\r\n\r\nCREATE TABLE singers (\r\n singer_id INT64 DEFAULT\r\n (GET_NEXT_SEQUENCE_VALUE(SEQUENCE singer_id_sequence)),\r\n name STRING(1024),\r\n biography STRING(MAX),\r\n) PRIMARY KEY (singer_id);\r\n\r\nCREATE TABLE albums (\r\n album_id INT64,\r\n singer_id INT64\r\n album_nam STRING(1024),\r\n song_list STRING(MAX),\r\n CONSTRAINT FK_singer_album\r\n FOREIGN KEY (album_id) REFERENCES singers (singer_id)\r\n) PRIMARY KEY (album_id);"), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef96b79f250>)])]>The requiredbit_reversed_positiveoption indicates that the numbers generated by the sequence will be greater than zero, but not ordered (see theintroductory postfor more information on bit-reversed sequences). Generated values are of type INT64. As you migrate existing rows from your source database to Spanner, the rows’ keys remain unchanged. For new inserts that don’t specify a primary key, Spanner automatically calls theGET_NEXT_SEQUENCE_VALUE()function to retrieve a new number. Since these values distribute uniformly across the range[1, 263],there could be collisions with the existing keys. If this occurred your insert would fail with a “key already exists” error. To prevent this, you can configure the sequence toskipthe range of values covered by the existing keys. For example, assuming that the tablesingerswas migrated from PostgreSQL, where its key,singer_id, was inSERIALtype: code_block<ListValue: [StructValue([('code', 'CREATE TABLE singers (\r\n singer_id SERIAL PRIMARY KEY,\r\n name varchar(1024),\r\n biography varchar\r\n);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef96b79fa60>)])]>The column values are monotonically increasing. After migration, we retrieve the maximum value of thesinger_id: code_block<ListValue: [StructValue([('code', 'SELECT MAX(singer_id) FROM singers;'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bcab80>)])]>Assuming the returned value is 20,000, you configure the sequence in Spanner to skip the range[1, 21000]. The extra 1,000 serves as a buffer to accommodate writes to the source database after the initial bulk migration. These values would typically be replicated later and we want to ensure they also will not conflict. code_block<ListValue: [StructValue([('code', 'ALTER SEQUENCE singer_id_sequence SET OPTIONS (\r\n skip_range_min = 1,\r\n skip_range_max = 21000\r\n);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bca430>)])]>The diagram below illustrates a few migrated rows along with new rows inserted in Spanner after migration: Now new keys generated in Spanner are guaranteed to not conflict with the range of keys generated in the source PostgreSQL database. Multi-database usageYou can take this skipped range concept one step further to support scenarios where either Spanner or the upstream database generates primary keys, for example to enable replication in either direction for disaster recovery during a migration cutover. To support this, you can configure each database to have a non-overlapping key value range. When you define a range for the other database, you can tell the Spanner sequence to skip over that range with the skipped range syntax. For example, after the bulk migration of our music tracking application, we’ll replicate data from PostgreSQL to Spanner to reduce the amount of time it takes to cut over. Once we’ve updated and tested the application on Spanner, we’ll cut over from PostgreSQL to Spanner, making it the system of record for updates and new primary keys. When we do, we‘ll reverse the flow of data between databases and replicate data back to the PostgreSQL instance just in case we need to revert if there’s a problem. In this scenario, sinceSERIALkeys in PostgreSQL are 32-bit signed integers, while our keys in Spanner are larger 64-bit numbers, we will do the following steps: 1. In PostgreSQL, alter the key column to be a 64-bit column, orbigint, code_block<ListValue: [StructValue([('code', 'ALTER TABLE singers ALTER COLUMN singer_id TYPE bigint;'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bca5e0>)])]>2. Since the sequencesingers_singer_id_seqused bysinger_idis still of typeint, its maximum value is already 231-1. To be safe, we can optionally set a CHECK constraint to the table in the source PostgreSQL database to ensure thatsinger_idvalues are always smaller or equal to 231-1: code_block<ListValue: [StructValue([('code', 'ALTER TABLE singers ADD CHECK (singer_id <= 2147483647);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bca3d0>)])]>3. In Spanner, we’ll alter the sequence to skip the range [1, 231-1]. code_block<ListValue: [StructValue([('code', 'ALTER SEQUENCE singer_id_sequence SET OPTIONS (\r\n skip_range_min = 1,\r\n skip_range_max = 2147483647 -- 2^31-1\r\n);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bca640>)])]>4. Deploy and test your usage, including from PostgreSQL to Spanner and vice versa. Using this technique, PostgreSQL will always generate keys in the 32-bit integer space, while Spanner’s keys are restricted to the 64-bit integer space, larger than all of the 32-bit numbers and wide enough for future growth. This ensures that both systems can independently generate keys that are guaranteed not to conflict. Migrating UUIDsUUID primary keys are generally easier to migrate than sequential integer keys.UUIDs, v4 in particular, are effectively unique regardless of where they are generated. (The math behind this is an interesting application of thebirthday problemin statistics.) As a result, UUID keys generated elsewhere will integrate easily with new UUID keys generated in Spanner and vice versa. The high-level strategy for migrating UUID keys is as follows: Define your UUID keys in Spanner using string columns with a default expression,GENERATE_UUID()orspanner.generate_uuid()in the PostgreSQL dialect. Export data from the source system, serializing the UUID keys as strings. Import the keys into Spanner as-is. Optionally enable foreign keys. In Spanner, you define a UUID primary key column as aSTRINGorTEXTtype, and assignGENERATE_UUID()as its default value. During migration, you bring all values of existing rows from the source database to Spanner, including key values. (See thismigration guidefor more details.) After migration, as new rows are inserted, Spanner callsGENERATE_UUID()to generate new UUID values for them. For example, the primary keyFanClubIdwill get a UUIDv4 value when we insert a new row into the table,FanClubs: code_block<ListValue: [StructValue([('code', 'CREATE TABLE fan_clubs (\r\n fan_club_id STRING(36) DEFAULT (GENERATE_UUID()),\r\n club_name STRING(1024),\r\n) PRIMARY KEY (fan_club_id);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bcaa60>)])]>code_block<ListValue: [StructValue([('code', 'INSERT INTO fan_clubs (club_name) VALUES ("SwiftFanClub");'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bca1f0>)])]>Migrating your own primary keysBit-reversed sequences and UUIDs provide unique values that won’t hotspot at scale when used as a primary key in Spanner. But they don’t provide any guarantees on the ordering of their values… by design! However, some applications rely on the order of the keys to determine recency or to sequence newly created data. Databases manually sharded for scale typically rely on a global counter, coordinated outside of any independent database instances. To use ordered keys generated externally in Spanner you create acomposite keythat combines a uniformly distributed value, such as a shard ID or a hash, as the first component and a sequential number as the second component. This preserves the ordered key values, but won’t hotspot at scale. In this example, we are migrating a MySQL table with anAUTO_INCREMENTprimary key,students, to Spanner. The downstream application generates student IDs, and the IDs are shared to end users (students, faculty, etc.) code_block<ListValue: [StructValue([('code', '// This is the table to be migrated from MySQL\r\nCREATE TABLE students (\r\n student_id INT NOT NULL AUTO_INCREMENT,\r\n info VARCHAR(2048),\r\n PRIMARY KEY (student_id)\r\n);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bcae20>)])]>In Spanner, we add a generated column containing a hash of theStudentIdcolumn: code_block<ListValue: [StructValue([('code', 'CREATE TABLE student (\r\n student_id_hash INT64 AS (FARM_FINGERPRINT(student_id)) STORED,\r\n student_id INT64 NOT NULL,\r\n info STRING(2048),\r\n) PRIMARY KEY (student_id_hash, student_id);'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3ef969bcab50>)])]>Get started todayWe recently introduced new capabilities that help users implement best practices for primary keys in Spanner using the SQL concepts they already know. The strategies detailed above minimize downstream application changes and maximize performance and availability in Spanner when migrating auto-incrementing and UUID from other relational databases. You can learn more about what makes Spanner unique and how it’s being used today. Or try it yourself for free for 90-days or for as little as $65 USD/month for a production-ready instance that grows with your business without downtime or disruptive rearchitecture.
  • Forum Statistics

    42.2k
    Total Topics
    42k
    Total Posts
×
×
  • Create New...