Showing results for tags 'yaml'.

amazon mwaa Dynamic DAG generation with YAML and DAG Factory in Amazon MWAA

Amazon Web Services posted a topic in CI/CD, GitOps, Orchestration & Scheduling

Amazon Managed Workflow for Apache Airflow (Amazon MWAA) is a managed service that allows you to use a familiar Apache Airflow environment with improved scalability, availability, and security to enhance and scale your business workflows without the operational burden of managing the underlying infrastructure. In Airflow, Directed Acyclic Graphs (DAGs) are defined as Python code. Dynamic DAGs refer to the ability to generate DAGs on the fly during runtime, typically based on some external conditions, configurations, or parameters. Dynamic DAGs helps you to create, schedule, and run tasks within a DAG based on data and configurations that may change over time. There are various ways to introduce dynamism in Airflow DAGs (dynamic DAG generation) using environment variables and external files. One of the approaches is to use the DAG Factory YAML based configuration file method. This library aims to facilitate the creation and configuration of new DAGs by using declarative parameters in YAML. It allows default customizations and is open-source, making it simple to create and customize new functionalities. In this post, we explore the process of creating Dynamic DAGs with YAML files, using the DAG Factory library. Dynamic DAGs offer several benefits: Enhanced code reusability – By structuring DAGs through YAML files, we promote reusable components, reducing redundancy in your workflow definitions. Streamlined maintenance – YAML-based DAG generation simplifies the process of modifying and updating workflows, ensuring smoother maintenance procedures. Flexible parameterization – With YAML, you can parameterize DAG configurations, facilitating dynamic adjustments to workflows based on varying requirements. Improved scheduler efficiency – Dynamic DAGs enable more efficient scheduling, optimizing resource allocation and enhancing overall workflow runs Enhanced scalability – YAML-driven DAGs allow for parallel runs, enabling scalable workflows capable of handling increased workloads efficiently. By harnessing the power of YAML files and the DAG Factory library, we unleash a versatile approach to building and managing DAGs, empowering you to create robust, scalable, and maintainable data pipelines. Overview of solution In this post, we will use an example DAG file that is designed to process a COVID-19 data set. The workflow process involves processing an open source data set offered by WHO-COVID-19-Global. After we install the DAG-Factory Python package, we create a YAML file that has definitions of various tasks. We process the country-specific death count by passing Country as a variable, which creates individual country-based DAGs. The following diagram illustrates the overall solution along with data flows within logical blocks. Prerequisites For this walkthrough, you should have the following prerequisites: An AWS account If you don’t already have an AWS account, you can sign up for one.. Python 3.6.0+ and Amazon MWAA 2.0+ Environment in order to operate the dag-factory library Additionally, complete the following steps (run the setup in an AWS Region where Amazon MWAA is available): Create an Amazon MWAA environment (if you don’t have one already). If this is your first time using Amazon MWAA, refer to Introducing Amazon Managed Workflows for Apache Airflow (MWAA). Make sure the AWS Identity and Access Management (IAM) user or role used for setting up the environment has IAM policies attached for the following permissions: Read and write access to Amazon Simple Storage Service (Amazon S3). For details, refer to Amazon S3: Allows read and write access to objects in an S3 Bucket, programmatically and in the console. Full access to the Amazon MWAA console. The access policies mentioned here are just for the example in this post. In a production environment, provide only the needed granular permissions by exercising least privilege principles. Create an unique (within an account) Amazon S3 bucket name while creating your Amazon MWAA environment, and create folders called dags and requirements. Create and upload a requirements.txt file with the following content to the requirements folder. Replace {environment-version} with your environment’s version number, and {Python-version} with the version of Python that’s compatible with your environment: --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-{Airflow-version}/constraints-{Python-version}.txt" dag-factory==0.19.0 pandas==2.1.4 Pandas is needed just for the example use case described in this post, and dag-factory is the only required plug-in. It is recommended to check the compatibility of the latest version of dag-factory with Amazon MWAA. The boto and psycopg2-binary libraries are included with the Apache Airflow v2 base install and don’t need to be specified in your requirements.txt file. Download the WHO-COVID-19-global data file to your local machine and upload it under the dags prefix of your S3 bucket. Make sure that you are pointing to the latest AWS S3 bucket version of your requirements.txt file for the additional package installation to happen. This should typically take between 15 – 20 minutes depending on your environment configuration. Validate the DAGs When your Amazon MWAA environment shows as Available on the Amazon MWAA console, navigate to the Airflow UI by choosing Open Airflow UI next to your environment. Verify the existing DAGs by navigating to the DAGs tab. Configure your DAGs Complete the following steps: Create empty files named dynamic_dags.yml, example_dag_factory.py and process_s3_data.py on your local machine. Edit the process_s3_data.py file and save it with following code content, then upload the file back to the Amazon S3 bucket dags folder. We are doing some basic data processing in the code: Read the file from an Amazon S3 location Rename the Country_code column as appropriate to the country. Filter data by the given country. Write the processed final data into CSV format and upload back to S3 prefix. import boto3 import pandas as pd import io def process_s3_data(COUNTRY): ### Top level Variables replace S3_BUCKET with your bucket name ### s3 = boto3.client('s3') S3_BUCKET = "my-mwaa-assets-bucket-sfj33ddkm" INPUT_KEY = "dags/WHO-COVID-19-global-data.csv" OUTPUT_KEY = "dags/count_death" ### get csv file ### response = s3.get_object(Bucket=S3_BUCKET, Key=INPUT_KEY) status = response['ResponseMetadata']['HTTPStatusCode'] if status == 200: ### read csv file and filter based on the country to write back ### df = pd.read_csv(response.get("Body")) df.rename(columns={"Country_code": "country"}, inplace=True) filtered_df = df[df['country'] == COUNTRY] with io.StringIO() as csv_buffer: filtered_df.to_csv(csv_buffer, index=False) response = s3.put_object( Bucket=S3_BUCKET, Key=OUTPUT_KEY + '_' + COUNTRY + '.csv', Body=csv_buffer.getvalue() ) status = response['ResponseMetadata']['HTTPStatusCode'] if status == 200: print(f"Successful S3 put_object response. Status - {status}") else: print(f"Unsuccessful S3 put_object response. Status - {status}") else: print(f"Unsuccessful S3 get_object response. Status - {status}") Edit the dynamic_dags.yml and save it with the following code content, then upload the file back to the dags folder. We are stitching various DAGs based on the country as follows: Define the default arguments that are passed to all DAGs. Create a DAG definition for individual countries by passing op_args Map the process_s3_data function with python_callable_name. Use Python Operator to process csv file data stored in Amazon S3 bucket. We have set schedule_interval as 10 minutes, but feel free to adjust this value as needed. default: default_args: owner: "airflow" start_date: "2024-03-01" retries: 1 retry_delay_sec: 300 concurrency: 1 max_active_runs: 1 dagrun_timeout_sec: 600 default_view: "tree" orientation: "LR" schedule_interval: "*/10 * * * *" module3_dynamic_dag_Australia: tasks: task_process_s3_data: task_id: process_s3_data operator: airflow.operators.python.PythonOperator python_callable_name: process_s3_data python_callable_file: /usr/local/airflow/dags/process_s3_data.py op_args: - "Australia" module3_dynamic_dag_Brazil: tasks: task_process_s3_data: task_id: process_s3_data operator: airflow.operators.python.PythonOperator python_callable_name: process_s3_data python_callable_file: /usr/local/airflow/dags/process_s3_data.py op_args: - "Brazil" module3_dynamic_dag_India: tasks: task_process_s3_data: task_id: process_s3_data operator: airflow.operators.python.PythonOperator python_callable_name: process_s3_data python_callable_file: /usr/local/airflow/dags/process_s3_data.py op_args: - "India" module3_dynamic_dag_Japan: tasks: task_process_s3_data: task_id: process_s3_data operator: airflow.operators.python.PythonOperator python_callable_name: process_s3_data python_callable_file: /usr/local/airflow/dags/process_s3_data.py op_args: - "Japan" module3_dynamic_dag_Mexico: tasks: task_process_s3_data: task_id: process_s3_data operator: airflow.operators.python.PythonOperator python_callable_name: process_s3_data python_callable_file: /usr/local/airflow/dags/process_s3_data.py op_args: - "Mexico" module3_dynamic_dag_Russia: tasks: task_process_s3_data: task_id: process_s3_data operator: airflow.operators.python.PythonOperator python_callable_name: process_s3_data python_callable_file: /usr/local/airflow/dags/process_s3_data.py op_args: - "Russia" module3_dynamic_dag_Spain: tasks: task_process_s3_data: task_id: process_s3_data operator: airflow.operators.python.PythonOperator python_callable_name: process_s3_data python_callable_file: /usr/local/airflow/dags/process_s3_data.py op_args: - "Spain" Edit the file example_dag_factory.py and save it with the following code content, then upload the file back to dags folder. The code cleans the existing the DAGs and generates clean_dags() method and the creating new DAGs using the generate_dags() method from the DagFactory instance. from airflow import DAG import dagfactory config_file = "/usr/local/airflow/dags/dynamic_dags.yml" example_dag_factory = dagfactory.DagFactory(config_file) ## to clean up or delete any existing DAGs ## example_dag_factory.clean_dags(globals()) ## generate and create new DAGs ## example_dag_factory.generate_dags(globals()) After you upload the files, go back to the Airflow UI console and navigate to the DAGs tab, where you will find new DAGs. Once you upload the files, go back to the Airflow UI console and under the DAGs tab you will find new DAGs are appearing as shown below: You can enable DAGs by making them active and testing them individually. Upon activation, an additional CSV file named count_death_{COUNTRY_CODE}.csv is generated in the dags folder. Cleaning up There may be costs associated with using the various AWS services discussed in this post. To prevent incurring future charges, delete the Amazon MWAA environment after you have completed the tasks outlined in this post, and empty and delete the S3 bucket. Conclusion In this blog post we demonstrated how to use the dag-factory library to create dynamic DAGs. Dynamic DAGs are characterized by their ability to generate results with each parsing of the DAG file based on configurations. Consider using dynamic DAGs in the following scenarios: Automating migration from a legacy system to Airflow, where flexibility in DAG generation is crucial Situations where only a parameter changes between different DAGs, streamlining the workflow management process Managing DAGs that are reliant on the evolving structure of a source system, providing adaptability to changes Establishing standardized practices for DAGs across your team or organization by creating these blueprints, promoting consistency and efficiency Embracing YAML-based declarations over complex Python coding, simplifying DAG configuration and maintenance processes Creating data driven workflows that adapt and evolve based on the data inputs, enabling efficient automation By incorporating dynamic DAGs into your workflow, you can enhance automation, adaptability, and standardization, ultimately improving the efficiency and effectiveness of your data pipeline management. To learn more about Amazon MWAA DAG Factory, visit Amazon MWAA for Analytics Workshop: DAG Factory. For additional details and code examples on Amazon MWAA, visit the Amazon MWAA User Guide and the Amazon MWAA examples GitHub repository. About the Authors Jayesh Shinde is Sr. Application Architect with AWS ProServe India. He specializes in creating various solutions that are cloud centered using modern software development practices like serverless, DevOps, and analytics. Harshd Yeola is Sr. Cloud Architect with AWS ProServe India helping customers to migrate and modernize their infrastructure into AWS. He specializes in building DevSecOps and scalable infrastructure using containers, AIOPs, and AWS Developer Tools and services. View the full article

Monday at 05:46 PM
- dags
- yaml
- (and 1 more)
  Tagged with:
  - dags
  - yaml
  - dag factory

terraform Deploy Terraform using Azure DevOps YAML Pipelines

Build5Nines posted a topic in Infrastructure-as-Code

HashiCorp Terraform is a popular tool for managing infrastructure as code (IaC). By defining your IaC using Terraform, you can use version control with your infrastructure configuration and also automate infrastructure deployment in a consistent and repeatable way. Azure DevOps Pipelines can be used to setup YAML pipelines to instrument the Terraform infrastructure deployments using […] The article Deploy Terraform using Azure DevOps YAML Pipelines appeared first on Build5Nines. View the full article

February 17, 2023
- azure devops
- azure
- (and 2 more)
  Tagged with:
  - azure devops
  - azure
  - yaml
  - pipelines

Deploy C# Script Function App using Azure DevOps YAML Pipeline

Build5Nines posted a topic in CI/CD, GitOps, Orchestration & Scheduling

A powerful way to write and build Azure Functions is to use C# Script. This allows you to easily write function directly within the Azure Portal, or even using source control as simple .csx script files. You don’t need to pre-compile the function before deploying it to Azure and can easily edit and release changes. […] The article Deploy C# Script Function App using Azure DevOps YAML Pipeline appeared first on Build5Nines. View the full article

July 14, 2022
- csharp
- azure devops
- (and 2 more)
  Tagged with:
  - csharp
  - azure devops
  - yaml
  - pipelines

ansible How to use YAML nesting, lists, and comments in Ansible playbooks

Linux.com posted a topic in Infrastructure-as-Code

Although YAML is considered easy to understand, its syntax can be quite confusing. Use this guide to the basics. Read More at Enable Sysadmin The post How to use YAML nesting, lists, and comments in Ansible playbooks appeared first on Linux.com. View the full article

CLI tools for validating and linting YAML files

James posted a topic in Development & Programming

YAML (YAML Ain’t Markup Language) is a frequently used data serialization language that is often used as configs for tools such as Kubernetes, Jenkins and serverless framework, it is supported in some fashion by most popular programming languages. More often than not we are keeping these YAML files in version control…so how do we enforce a consistent format/style and ensure we always push up validated YAML files. If we have a strict schema, how would you check 100s of YAMLs in one batch or quickly check a single file on the CLI is valid. Why wait for your app to tell you a YAML file is wrong when deployed when you can do it quickly on the CLI? Read Article

May 28, 2022
- yaml
- linting
- (and 2 more)
  Tagged with:
  - yaml
  - linting
  - cli
  - tools

Kubernetes YAML: Enforcing best practices and security policies in CI/CD and GitOps pipelines

The Chief I/O posted a topic in Kubernetes & Container Orchestration

Using Kubernetes is a synonym for manipulating YAML. Even if the YAML community describes it as a human-readable language, it can sometimes be tricky to read it, especially in the context of Kubernetes when you are manipulating complex deployments, services, ingresses, and other resources. View the full article

November 23, 2020
4 replies
- k8s
- kubernetes
- (and 4 more)
  Tagged with:
  - k8s
  - kubernetes
  - yaml
  - ci/cd
  - gitops
  - best practices

Cloud Code makes YAML easy for hundreds of popular Kubernetes CRDs

Google Cloud Platform posted a topic in Google Cloud Platform

When developing a service to deploy on Kubernetes, do you sometimes feel like you’re more focused on your YAML files than on your application? When working with YAML, do you find it hard to detect errors early in the development process? We created Cloud Code to let you spend more time writing code and less time configuring your application, including authoring support features such as inline documentation, completions, and schema validation, a.k.a. “linting.” Completions provided by Cloud Code for a Kubernetes deployment.yaml file Inline documentation provided by Cloud Code for a Kubernetes deployment.yaml file Schema validation provided by Cloud Code for a Kubernetes deployment.yaml file But over the years, working with Kubernetes YAML has become increasinglymore complex. As Kubernetes has grown more popular, many developers have extended the Kubernetes API with new Operators and Custom Resource Definitions (CRDs). These new Operators and CRDs expanded the Kubernetes ecosystem with new functionality such as continuous integration and delivery, machine learning, and network security.Today, we’re excited to share authoring support for a broad set of Kubernetes CRDs, including: Over 400 popular Kubernetes CRDs out of the box—up from just a handful Any existing CRDs in your Kubernetes cluster Any CRDs you add from your local machine or a URL Cloud Code is a set of plugins for the VS Code and JetBrains Integrated Development Environments (IDEs), and provides everything you need to write, debug, and deploy your cloud-native applications. Now, its authoring support makes it easier to write, understand, and see errors in the YAML for a wide range of Kubernetes CRDs. Cloud Code’s enhanced authoring support lets you leverage this custom Kubernetes functionality by creating a resource file that conforms to the CRD. For example, you might want to distribute your TensorFlow jobs across multiple pods in a cluster. You can do this by authoring a TFJob resource based on the TFJob CRD and applying it to the cluster where the KubeFlow operator can act on it. Expanding built-in support Cloud Code has expanded authoring support for over 400 of the most popular Kubernetes CRDs, including those used by Google Cloud and Anthos. This includes a wide variety of CRDs such as: Agones for game servers Gatekeeper for enforcing policy KubeFlow for machine learning workflows Calico for networking and network security cert-manager for managing and issuing TLS certificates and many more Inline documentation, completions, and schema validation for the Agones GameServer CRD provided by Cloud Code. Works with your cluster’s CRDsWhile Cloud Code now supports a breadth of popular public, Google Cloud, and Anthos CRDs, you may have your own private CRDs installed on a cluster. When you set a cluster running Kubernetes v1.16 or above as the active context in Cloud Code's Kubernetes Explorer, Cloud Code automatically provides authoring support from the schema of all CRDs installed on the cluster. The CronTab CRD installed on the active cluster in Cloud Code for VS Code’s Kubernetes Explorer Authoring support provided by Cloud Code for the CronTab CRD installed on the active cluster Add your own CRDs Despite the breadth of existing CRDs, you may find that there isn’t one that meets your needs. The solution here is to define your own CRD. For example, if you’re running your in-house CI system on Kubernetes, you could define your CRD schemas and allow developers to easily point Cloud Code to copies of those CRD schema files, to get authoring assistance for the resources in their IDEs. To add a CRD to Cloud Code, just point Cloud Code to a local path or remote URL to a file defining the custom resource. The remote URL can be as simple as a direct link to a file in GitHub. If you want to learn more about custom resource definitions or create your own, take a look at this documentation page. Once configured, you get the same great inline documentation, completions, and linting from Cloud Code when editing that CRDs YAML files—and it’s super easy to set up in both VS Code and JetBrains IDEs. Specifying your own CRD in settings.json in VS Code Preferences > Other Settings > Cloud Code > Kubernetes in IntelliJ Get started today To see how Cloud Code can help you simplify your Kubernetes development, we invite you to try out the expanded Kubernetes CRD authoring support. To get started, simply install Cloud Code from the VS Code or JetBrains extension marketplaces, open a CRD’s YAML file, and start editing. Once you have Cloud Code installed, you can also try Cloud Code’s fast, iterative development and debugging capabilities for your Kubernetes projects. Beyond Kubernetes, Cloud Code can also help you add Google Cloud APIs to your project or start developing a Cloud Run service with the Cloud Run Emulator. Related Article Best practices for building Kubernetes Operators and stateful apps Recently, the Kubernetes community has started to add support for running large stateful applications such as databases, analytics and ma... Read Article

October 16, 2020
- gcp
- cloud code
- (and 3 more)
  Tagged with:
  - gcp
  - cloud code
  - yaml
  - k8s
  - crd

ingraph Infrastructure as Code on AWS in a familiar language — the right way with InGraph

A Cloud Guru posted a topic in Infrastructure-as-Code

Infrastructure as Code on AWS in a familiar language — the right way with InGraphInGraph is CloudFormation in Python syntax instead of YAML TLDR; check out the project called InGraphI’m on the record as preferring declarative infrastructure as code (IaC) to imperative versions, such as the AWS CDK. I believe that declarative IaC has a lower total cost of ownership (TCO). But while I prefer declarative to imperative, imperative IaC enables something I consider much worse: infrastructure as imperative programs that generate declarative IaC documents. Almost all imperative IaC frameworks work this way. There are two aspects to this that I consider particularly damaging. First, these programs are generally not enforced to be deterministic, and when it’s not enforced, people are always going to find clever ways to solve their problems by externalizing state, and now your CI/CD process doesn’t have repeatable builds. Is this fixable? Sure, if you stick to certain practices and processes, you can mitigate this problem, but as Forrest Brazeal says, CI/CD is the Conway’s Law-iest part of your stack, and your ability to accomplish those practices may be limited. So it’s better to stick to tooling that inherently has the properties you want, like determinism. The second aspect is that the generative step is inevitably lossy. The primary reason people build these imperative frameworks is that they are unhappy with the constraints of the declarative language they decide to generate. That’s actually not true — the primary reason is they just want tab completion — but the second most important reason is to allow constructions that aren’t possible otherwise. The reason I consider this to be a problem is that, when lossily flattening the abstractions built up in the imperative program, the developer’s mental models are lost, and not fully reflected in the deployed infrastructure. We need our mental models, as expressed in the IaC we write, to be reified in the cloud, so that when something goes wrong, we can dive in and understand the deployed infrastructure in terms of the code we have written. The less context from the original that’s present, the less our tooling helps us understand what we’ve actually created. Metadata tags are not sufficient; we need proper cloud-side representation and API support. An additional problem with the generative step is that in addition to being lossy, it often does not aim to produce human-consumable output. A regular refrain heard in CDK contexts is “CloudFormation is assembly code”. How many C++ developers are adept at diving into assembly code to debug problems? Assembly code is not intended to be human-friendly. But for infrastructure as code, many people involved in the lifecycle of cloud resources are going to want to collaborate with the developer using CloudFormation. Security folks aren’t going to want to review developer programs in a variety of languages across the organization, and given the lack of deterministic guarantees, they’re going to want to review exactly what’s being deployed. Operations personnel are in a similar situation. But if the generative step produces verbose templates that the developer can’t easily navigate, this collaboration is going to have a lot of friction. When security points to a resource they have an issue with, it’s going to be harder to connect that with a remediation to the original program. Here again there are often suggestions that processes can mitigate this pain, but as before, that may not be possible. If IaC in a familiar language has value, a developer should be able to get most of the value without drawbacks without needing the whole organization to change. It turns out there’s a way to address these problems. Spoiler alert: it’s a project called InGraph. You should go read about it in their introductory blog post, but I want to add some color to the development process it went through, and where I see future value in the project. Last fall, I gave a talk at Serverlessconf on this topic. One of the things I stressed was that with full parity between an imperative IaC system and a declarative system underlying it, the choice is then a matter of preference, without TCO pitfalls. Some time later, I got a DM on Twitter from Farzad Senart asking if I’d like to look at a project for IaC in Python, based on the ideas I laid out in my talk. Obviously I was flattered, and agreed to see what they’d built. What they showed me blew my mind. They had not taken the route that everyone else had taken, building a library so that the user’s code became a program in the imperative language that generated declarative IaC. Instead, they hooked into the Python interpreter itself. The user didn’t need to import anything; the interpreter understands how to map the AST, and the instantiated code, into a CloudFormation template. I’ve pointed out before that YAML isn’t a language — it’s a syntax. CloudFormation is a declarative programming language with YAML syntax. What Farzad and his compatriot Lionel Suss had done was build a new IaC programming language with Python syntax! Everything an IDE can do with Python, it could do with these programs. There was a lot of promise in the prototype I was shown. It didn’t draw the L1/L2 distinction that the CDK draws, where the design of abstractions departs from the representation of native CloudFormation resources; high-level abstractions use the same model. Still, that initial version departed from what I hope to see in an ideal imperative IaC framework. For example, it still allowed nondeterministic programs. There were two features that I thought were really important. The first is the ability to produce CloudFormation templates that have template parameters produces more parity between imperative and declarative; you are not producing a template that is an instantiation of your program for a specific stack, but rather transforming your program from an imperative language to a representation in the declarative language, which can then be used on its own. This means that semantics of your program are not resolved at compile time, which is great when you want to create flexible templates that you can provide across your organization, for example in AWS Service Catalog. The second is that your variable name for a given resource should become the logical id for that resource in the template. For me, this is the essence of the concept of reifying the developer’s mental models in the cloud. What you call that resource is what CloudFormation calls it too. Over the next few months, Lionel and Farzad continued to evolve the framework, experimenting with different approaches. I was impressed by the speed and capability with which they could produce new features through hooking into the Python interpreter. The framework started enforcing that the program was deterministic, and could be passed a JSON object at run time to set the value of parameters within the program. But then they took it further: they created a feature where you could set a function parameter to turn into a template parameter only if was not given a value in the program. So the parts of your Python program that were not fully determined became not fully determined inside the output template! Another very cool feature is that because they are running the interpreter, they know literally all the files you touch, and as part of a run, package up all of your source, the input parameters you’ve given, the output template, and the metadata about how all of it relates, and upload that all together to S3 as a build artifact that can be referenced, without you having to explicitly define anything other than the main Python file as the entry point. They even demoed a Chrome extension that leveraged that artifact to let you go from the CloudFormation console from a resource back into your program’s source where it was defined. The final piece of the puzzle, which is where I think they’ve really unlocked a special idea, is the following idea: what if you could only write Python that could be fully represented as a CloudFormation template? You’re now no longer writing in a different language. You’re writing CloudFormation with Python syntax instead of YAML syntax. Obviously, this constrains the Python features you can use. For example, you can use string formatting, because that can be turned into Fn::Sub in the template, but you can’t do numeric math, because that’s not possible in a template. You might object, why would I want write Python with such constraints? It’s because your templates are literally your programs. You can look at the template and instantly know how your program was mapped to it. One reason I’m excited about this, and this might be counterintuitive, is that it’s a great way to demonstrate the most-needed features we should get support for in CloudFormation templates. It exposes the rough edges that we need solved in CloudFormation, rather than papering over these gaps on the client side. For example, in an InGraph program you can build up abstractions, which use the same model as native CloudFormation resources. In the resource graph, they become subgraphs. CloudFormation doesn’t have a way of representing this nested structure, so it has to be flattened. It’s represented to the best extent possible — the logical id of a resource is essentially the concatenation of its name with its containing resource’s name — but it demonstrates that CloudFormation needs to provide us ways to fully represent these subgraphs (since nested stacks do not perform this function, and have numerous shortcomings that people have documented). Today, your InGraph program must be fully deterministic, and obey a subset of all deterministic Python features; you can write another Python program to nondeterministically and/or using any Python feature generate a JSON output, and use that as input to an InGraph program. But I think there’s a cool future possibility here: you should be able to write Python code inside your program that goes beyond the constraints, as long as the execution graph is separable into the unconstrained and constrained parts. For example, numeric math should just be performed in the unconstrained part. Then InGraph can snapshot the full state that comes out of the unconstrained part of the program, save it as an artifact, and then proceed to inject that state to resolve the constrained part that becomes the template. What you still wouldn’t be able to do is use unconstrained code that needed values that would only exist in the resolved template. So you couldn’t, say, have a for loop that used the output of a resource, since that output only exists at deployment time. I’m excited about InGraph because while I believe the lowest TCO is still with declarative templates, at the point where we have a principled imperative tool that helps your program become fully represented cloud-side, it’s really just a matter of personal preference. It has potential as a tool to make it easier for developers to learn CloudFormation — really learn CloudFormation, not stay arm’s length away from it — and be able to converse on the basis of their generated CloudFormation templates with others in their organization. Finally, I’d like to give a plug to Farzad and Lionel as lifa.dev. Their level of expertise in this area has really impressed me. Over the course of the project, they built, each time basically from the ground up, several different versions, each one essentially a Domain Specific Language with Python syntax. This seems to me like a really useful skill. Lionel and Farzad have created lifa.dev both as a home for their open-source developments and findings around AWS and as a way to provide their expertise in the field to companies worldwide. If you use AWS CloudFormation, you should check them out for assistance in your CI/CD journey. Infrastructure as Code on AWS in a familiar language — the right way with InGraph was originally published in A Cloud Guru on Medium, where people are continuing the conversation by highlighting and responding to this story. View the full article

April 6, 2020
- aws
- iac
- (and 3 more)
  Tagged with:
  - aws
  - iac
  - aws cloudformation
  - python
  - yaml

Sign In

Search the Community

Search By Tags

Search By Author

Content Type

Forums

Calendars

Find results in...

Find results that contain...

Date Created

Start

End

Last Updated

Start

End

Filter by number of...

Minimum number of comments

Minimum number of replies

Minimum number of reviews

Minimum number of views

Joined

Start

End

Group

Website URL

LinkedIn Profile URL

About Me

Cloud Platforms

Cloud Experience

Development Experience

Current Role

Skills

Certifications

Favourite Tools

Interests

Forum Statistics