Showing results for tags 'pipelines'.

ai pipelines Kubernetes & its Role in AI: Orchestrating End-to-End AI Pipelines

Amazic posted a topic in Kubernetes & Container Orchestration

Artificial Intelligence (AI) and Kubernetes are pillars of modern technology, each contributing significantly to innovation and efficiency. With AI adoption skyrocketing across industries, the demand for robust infrastructure to support AI workloads has surged. According to a recent report by Gartner, global spending on AI is projected to reach $297 billion by 2027 from $124 billion in 2022, with businesses increasingly investing in AI-driven solutions to gain a competitive edge. Concurrently, Kubernetes has emerged as the de facto standard for container orchestration, witnessing a remarkable growth trajectory. CNCF published the results of its latest microsurvey report on cloud-native FinOps and cloud financial management (CFM). Kubernetes has driven cloud spending up for 49% of respondents, while 28% stated their costs remain unchanged and 24% saved after migrating to Kubernetes. This intersection of AI and Kubernetes signifies a paradigm shift in technology, empowering organizations to harness the power of AI at scale while leveraging Kubernetes’ agility and scalability for seamless deployment and management. This post will examine Kubernetes’s role in managing end-to-end AI pipelines, including developing, training, and deploying AI models at each procedure phase. We’ll discuss how Kubernetes facilitates the creation of efficient, repeatable workflows by data scientists and machine learning engineers, increasing output and accelerating innovation in the AI space. Understanding Kubernetes Before discussing Kubernetes’s importance in AI, let’s look at its definition and how it works. Kubernetes, often known as K8s, is an open-source container orchestration tool first developed by Google. It simplifies and automates containerized applications’ scaling, deployment, and management. Containers allow for the lightweight, portable packaging and deployment of applications along with their dependencies and customizations. Kubernetes nullifies the infrastructure concerns with a platform for delivering and managing containerized workloads. Empowering Innovation: The Synergy Between AI and Kubernetes Artificial intelligence (AI) and Kubernetes work hand in hand at the forefront of modern technology. Artificial intelligence (AI) is transforming industries through intelligent automation and decision-making. Meanwhile, Kubernetes provides the dependable infrastructure for deploying, scaling, and managing AI applications. Kubernetes facilitates the seamless coordination of AI workloads across several environments, optimizing resource usage and guaranteeing dependability. In return, AI leverages the scalability and agility of Kubernetes to provide innovative solutions that increase productivity and encourage business growth. Kubernetes and AI work together to produce a dynamic synergy that allows businesses to take full advantage of AI technologies in the rapidly evolving digital ecosystem. End-to-End AI Pipelines AI pipelines are the complex and interconnected procedures utilized in creating, training, and applying AI models. These pipelines typically include data preprocessing, model training, assessment, tuning, and deployment. Effective management of these pipelines at all levels requires automation and coordination. Kubernetes provides the infrastructure needed to orchestrate end-to-end AI pipelines with ease. Let us discuss how Kubernetes facilitates AI model development, training, and deployment. Development Phase During the development stage of an AI project, data scientists and machine learning engineers experiment with various algorithms, datasets, and model architectures to construct and enhance AI models. Setting up development environments is made easier by Kubernetes, which isolates every stage of the AI pipeline behind containers. Developers can define Kubernetes manifests, representing application components’ desired state, such as networking configurations, volumes, and containers. Subsequently, Kubernetes automatically schedules and starts these containers across the cluster, ensuring consistent and repeatable development environments. Training Phase After the model design is finished, the model is trained on large datasets. Training deep learning models sometimes requires a lot of processing power, such as GPUs or TPUs, for speedier processing. Two of Kubernetes’ advantages are its ability to independently scale resources in response to demand and distribute computational jobs throughout the cluster. Data scientists can use Kubernetes’ horizontal scaling characteristics to train many models in parallel and save significant training time. Furthermore, Kubernetes makes resource limits and quotas easier to implement, ensuring fair resource allocation and preventing conflicts between multiple teams or projects. Evaluation and Tuning After training, AI models must be evaluated using validation datasets to determine their performance. By integrating with tools like Kubeflow and TensorFlow Extended (TFX), Kubernetes enables hyperparameter tweaking and automatic model evaluation. These frameworks provide prebuilt components for creating and managing AI pipelines on Kubernetes clusters. Data scientists may speed up the iterative process of improving model performance by developing workflows that automate model review, model selection, and hyperparameter tuning. Deployment Phase Once a model is sufficiently accurate, it must be applied to predict new data in real-world scenarios. The Kubernetes platform facilitates the deployment of AI models by eliminating infrastructure-related issues and providing tools for container orchestration and service discovery. Data scientists can bundle learned models into container images using platforms like Docker or Kubernetes’ built-in support for custom resources like Custom Resource Definitions (CRDs). When these containerized models are deployed as microservices, they may be accessed via RESTful APIs or gRPC endpoints. Scaling and Monitoring In industrial applications, AI models could face varying workloads and demand levels. Thanks to Kubernetes ‘ auto-scaling functionality, resources can be dynamically changed based on real-time data, such as CPU usage, RAM consumption, or other application-specific indicators. This ensures optimal effectiveness and efficient use of resources, particularly during periods of high demand. Furthermore, Kubernetes provides information about the health and functionality of AI applications with easy integration with logging and monitoring tools such as Prometheus and Grafana. Data scientists can set up alerts and dashboards to monitor key indicators and respond quickly to any anomalies or issues. Reproducibility and Portability One of Kubernetes’ key advantages for AI pipelines is its reproducibility and portability. Kubernetes manifests are used to declaratively specify the desired state of the application, including dependencies, configurations, and environment variables. These manifests can be version-managed using Git or other version control systems, which promotes collaboration and repeatability in various settings. Furthermore, Kubernetes abstracts away the underlying infrastructure, simplifying the installation of AI pipelines on any cloud provider or internal data center with minimal to no adjustments. In conclusion, Kubernetes is essential to coordinating end-to-end AI pipelines, which include creating, training, and implementing AI models at every stage of the process. By automating container orchestration and abstracting away infrastructure complexities, Kubernetes frees data scientists and machine learning engineers to concentrate on innovation rather than infrastructure maintenance. Organizations may use Kubernetes to provide AI-powered apps that match the demands of today’s changing business landscape, increase productivity, and accelerate the speed of AI innovation. The post Kubernetes & its Role in AI: Orchestrating End-to-End AI Pipelines appeared first on Amazic. View the full article

April 16
- ai
- orchestration
- (and 2 more)
  Tagged with:
  - ai
  - orchestration
  - kubernetes
  - pipelines

Managed Sportlogiq to Databricks Data Ingestion Pipelines for NHL Teams: A Game-Changing Alliance

Databricks posted a topic in Databases, Data Engineering & Data Science

Overview In the competitive world of professional hockey, NHL teams are always seeking to optimize their performance. Advanced analytics has become increasingly important... View the full article

March 29
- sportlogiq
- databricks
- (and 2 more)
  Tagged with:

kubeflow A deep dive into Kubeflow pipelines

Ubuntu posted a topic in Kubernetes & Container Orchestration

Widely adopted by both developers and organisations, Kubeflow is an MLOps platform that runs on Kubernetes and automates machine learning (ML) workloads. It covers the entire ML lifecycle, enabling data scientists and machine learning engineers to develop and deploy ML models. Kubeflow is designed as a suite of leading open source projects that enable different capabilities such as model serving, training or hypertuning optimisations. At Canonical, we deliver Charmed Kubeflow – an official distribution of the upstream solution with additional security maintenance, tool integrations, and enterprise support and managed services – so we know a thing or two about the project. In our experience, one of the most important concepts to understand with respect to both Kubeflow itself and the broader ML lifecycle is machine learning pipelines. Taking advantage of pipelines is the best way to effectively deploy models at scale in production, so let’s break down this critical component in the MLOps landscape. What is an ML pipeline? A machine learning pipeline is an important component of ML systems, ensuring simplified experimentation and capability to take models to production. They are a series of steps that automate how ML models are created, in order to streamline the workflow,development and deployment. ML pipelines simplify the complexity of the end-to-end ML lifecycle, helping professionals to develop and deploy models. Amongst their benefits, ML pipelines ensure scalability thanks to their ability to handle large volumes of data while supporting collaboration and reproducibility. A core value of MLOps platforms such as Kubeflow is that they enable professionals to build and maintain ML pipelines. What is Kubeflow Pipelines? Kubeflow Pipelines or KFP is the heart of Kubeflow. It is a Kubeflow component that enables the creation of ML pipelines. It is used to help you build and deploy container-based ML workflows that are portable and scalable. The main goals of Kubeflow Pipelines are to simplify the following processes: Orchestration of the end-to-end ML pipelines Experimentation with various ideas and techniques Experiment management Reuse of components and pipelines to enable users to quickly put together end-to-end solutions without having to re-build each time Components of Kubeflow Pipelines Kubeflow Pipelines is part of the Kubeflow project. It can be used as part of the project or as an independent tool. It is made of 3 main components: User interface (UI) for managing and tracking experiments, jobs, and runs Engine for scheduling multi-step ML workflows SDK for defining and manipulating pipelines and components Kubeflow Pipelines use cases Kubeflow Pipelines is typically most useful for advanced users of Kubeflow or professionals who already have experience with machine learning. You don’t necessarily need KFP in the experimentation phase of the ML journey, but it becomes useful when you want to take yourmodels to production. The main use cases for KFP include: Workflow automation: Data scientists and machine learning engineers often perform a lot of the initial experimentation phase manually to better understand optimisation possibilities and quickly iterate. But once they have defined their workflow, they can use KFP to automate the process and save time. Model deployment to production: Models are usually compiled in a binary file. Traditionally, for the model to be loaded to a server where the requirements for inference are met, this file would be manually copied to the machine that hosts the application. KFP simplifies this process by enabling you to build automated pipelines to multiple applications or servers. Model maintenance and updates: The ML lifecycle is an iterative process and models need to be updated periodically. KFP helps users run updates and rollbacks across multiple applications or servers. Once the model is updated in one place and the update transaction is complete, KFP ensures the update is quickly applied to all client applications. Multi-tenant ML environment: Organisations often have large data and ML teams that need to share their resources. KFP enables simple and effective sharing of the environment, where each collaborator gets an isolated environment. It is then utilised by the K8s cluster and tools such as Volcano to schedule resources or manage containers. This helps professionals isolate workflows and keep track of pending and running jobs for each collaborator. Benefits of KFP Among machine learning specialists, Kubeflow Pipelines is widely adopted for a number of reasons. The most important benefits of KFP include: Streamlined workflow automation: Kubeflow Pipelines allows users to define the machine learning pipelines as a sequence of steps, each with its input, output, and dependencies. This leads to streamlining the machine learning workflows, and reduces the overhead and complexity of managing and executing your pipelines. Improved collaboration: Kubeflow Pipelines provides a central and shared platform for data scientists, machine learning engineers, and IT operations teams to collaborate on machine learning projects. It allows them to share pipelines and artifacts with others, and enables the tracking and monitoring of the pipelines across the entire organisation. Enhanced performance and scalability: Kubeflow Pipelines runs on Kubernetes, which provides a scalable and flexible infrastructure for running machine learning pipelines and models. This allows you to easily scale up and down the pipelines, and ensure that your pipelines are performant and reliable. Resource optimisation: KFP is a cloud native application, so it can leverage the resource schedulers that Kubernetes platforms provide. This leads to optimised usage of the existing resources and faster project delivery. Extensive support for popular machine learning frameworks: KFP provides built-in support for popular machine learning frameworks like TensorFlow, PyTorch, and XGBoost, as well as a rich ecosystem of integrations and plugins for other tools and services. Charmed Kubeflow goes a step further and enables additional integrations with tools and frameworks such as NVIDIA NGC Containers, Triton Inference Server and MLflow. Whereas Kubeflow Pipelines is a feature-rich tool, it still raises some challenges for beginners. It comes with a steep learning curve and there is limited documentation available. Since it is a fully open source tool, there is a big community that can help beginners, but it can be frustrating at times. You can alleviate these challenges by taking advantage of enterprise support or managed services from organisations which distribute Kubeflow. Architecture of Kubeflow Pipelines Kubeflow Pipelines is a complex component with capabilities that unblock users and enable them to automate their workflows and reduce their time spent on manual tasks. The following architecture depicts these capabilities: source: Kubeflow community As the diagram illustrates, users can interact with KFP either through the user interface or through development tools such as Notebooks. Initially, users create components or specify a pipeline using the Kubeflow Pipelines domain-specific language (DSL). Once defined, the compiler transforms the Python code into a YAML static configuration. Then, the Pipeline Service creates a pipeline run from the static configuration. It calls the server of Kubernetes API for creating the necessary Kubernetes resources (CRDs) to run the pipeline. If you have a resource scheduler integrated, you can use it to run the pipeline when resources are available or at a desired time. To complete the pipeline, the containers are executed within the Kubernetes pods, using orchestration controllers. Two types of data can be stored. The first type is metadata, which includes experiments, jobs, pipeline runs, and single scalar metrics. The second type is artefacts, which includes pipeline packages, views, and large-scale metrics (time series). Metadata is stored in a MySQL database, whereas artefacts are stored within MinIO. Storing them in an external component also enables portability, so artefacts can be migrated to different clusters or environments. Kubernetes resources created by the Pipeline Service are monitored by the Persistence Agent. To enable reproducibility, the input and output of the containers are recorded. It enables professionals to use the configurations and replicates different tasks, also being able to check if the results match. They consist of parameters or data artefact URIs and are seen as metadata. The Pipeline web server enables users to get a visual understanding of the steps from the Kubeflow Pipelines. It presents various information, including list of pipelines currently running, history of pipeline execution, data artefacts and logs for debugging. Get started with Kubeflow Pipelines In order to access Kubefow Pipelines, users can either deploy them independently or as part of the Kubeflow project. For simplified deployment, we recommend using Charmed Kubeflow. Deploy Charmed Kubeflow following the tutorial. You can do it on any environment, including public cloud or on-prem. Ensure that you have enough resources available, so you do not bump into problems along the way Access the Kubeflow dashboard. In case you are accessing it from a VM or from a public cloud, please ensure that you change the SOCKs proxy settings. There you will have different options, including to upload an existing pipeline or create a new one. Clone this repository from Github which contains a simple example of how to use some of the components of Kubeflow Access the examples from the Notebook. There are several pipelines created which you can run, edit or play with. Of course, they are just examples. In order to build your own pipeline, check the official documentation of the Kubeflow project. Further reading Kubeflow vs MLflow Launch NGC containers with Kubeflow MLOps pipelines with Kubefow, MLflow and Seldon View the full article

Can we implement 2FA while running the pipeline?

shaheryar posted a topic in DevOps & SRE General Discussion

Kindly, tell me if can we implement the 2fa while my pipeline runs and log in to my server. Currently, it login with a username/password to my server. Now, I want an extra layer of 2FA. So, is there any mechanism to do this? Kindly reply to me ASAP. Thanks,

March 21
- 2fa
- pipelines

data pipelines Securing and Monitoring Your Data Pipeline: Best Practices for Kafka, AWS RDS, Lambda, and API Gateway Integration

DZone posted a topic in Databases, Data Engineering & Data Science

There are several steps involved in implementing a data pipeline that integrates Apache Kafka with AWS RDS and uses AWS Lambda and API Gateway to feed data into a web application. Here is a high-level overview of how to architect this solution: 1. Set Up Apache Kafka Apache Kafka is a distributed streaming platform that is capable of handling trillions of events a day. To set up Kafka, you can either install it on an EC2 instance or use Amazon Managed Streaming for Kafka (Amazon MSK), which is a fully managed service that makes it easy to build and run applications that use Apache Kafka to process streaming data. View the full article

February 29
- security
- pipelines
- (and 5 more)
  Tagged with:
  - security
  - pipelines
  - kafka
  - aws rds
  - lambda
  - api gateway
  - best practices

stateful pipelines A Deep Dive into the Latest Performance Improvements of Stateful Pipelines in Apache Spark Structured Streaming

Databricks posted a topic in Databases, Data Engineering & Data Science

This post is the second part of our two-part series on the latest performance improvements of stateful pipelines. The first part of this... View the full article

February 28
- databricks
- pipelines
- (and 4 more)
  Tagged with:
  - databricks
  - pipelines
  - spark
  - apache
  - deep dives
  - performance

Strategies for Implementing Automated Rollbacks in CI/CD Pipelines: Best Practices?

bjonson974 posted a topic in CI/CD, GitOps, Orchestration & Scheduling

Hello Everyone I am currently working on integrating automated rollbacks into our CI/CD pipeline to ensure a more robust deployment process. Our team is looking to find the best methods and tools that can be adopted to make this transition as smooth as possible. I came across this article- https://docs.aws.amazon.com/codedeploy/devops traininglatest/userguide/deployments-rollback-and-redeploy.html ,that provided some insights, but I'd love to hear from those of you who have hands-on experience in this area: What strategies have you implemented for automated rollbacks? How do you handle rollback complexities, especially when dealing with dependencies? Are there any specific tools or platforms that you recommend? What lessons have you learned and what pitfalls should we be aware of? Your real-world experiences will supplement what we've learned from the literature and help us make more informed decisions. Thanks in advance!

August 14, 2023
1 reply
- ci/cd
- pipelines
- (and 1 more)
  Tagged with:

best practices Securing DevOps: Best Practices for Integrating Security into the Pipeline

Kovair posted a topic in Security, Governance, Risk & Compliance

DevOps has become a game-changer, allowing teams to work together more efficiently. But there’s a big concern: security often gets left behind. This article is.....View the full article

tekton How to store non-git credentials in Openshift for a Tekton pipeline?

DevOps Jeremy posted a topic in Kubernetes & Container Orchestration

I have a Tekton pipeline that will be posting to a rest API. I need to pass credentials to this, but as far as I can tell from this documentation, the only options for a pipeline are Git and Docker authentication? How would I securely store username/password credentials that I can pass into a pipeline to ultimately convert to Basic Auth for the rest request?

January 5
- openshift
- credentials
- (and 1 more)
  Tagged with:

devsecops Key Components of a Successful DevSecOps Pipeline

DZone posted a topic in Security, Governance, Risk & Compliance

Security is critical in all phases of software development, including conception, creation, and release. DevSecOps is a practice that has grown in popularity as a means of assuring the security of a web application or software product. According to the AWS homepage, "DevSecOps is the practice of integrating security testing into every stage of the software development process. It consists of tools and methods that promote collaboration among developers, security experts, and operational teams in order to create software that is both efficient and secure. DevSecOps brings a cultural shift that makes security a shared responsibility for all software developers." View the full article

blue/green Blue/Green deployments using AWS CDK Pipelines and AWS CodeDeploy

Amazon Web Services posted a topic in CI/CD, GitOps, Orchestration & Scheduling

Customers often ask for help with implementing Blue/Green deployments to Amazon Elastic Container Service (Amazon ECS) using AWS CodeDeploy. Their use cases usually involve cross-Region and cross-account deployment scenarios. These requirements are challenging enough on their own, but in addition to those, there are specific design decisions that need to be considered when using CodeDeploy. These include how to configure CodeDeploy, when and how to create CodeDeploy resources (such as Application and Deployment Group), and how to write code that can be used to deploy to any combination of account and Region. Today, I will discuss those design decisions in detail and how to use CDK Pipelines to implement a self-mutating pipeline that deploys services to Amazon ECS in cross-account and cross-Region scenarios. At the end of this blog post, I also introduce a demo application, available in Java, that follows best practices for developing and deploying cloud infrastructure using AWS Cloud Development Kit (AWS CDK)... View the full article

October 10, 2023
- 1
- deployments
- aws cdk
- (and 4 more)
  Tagged with:
  - deployments
  - aws cdk
  - cdk
  - pipelines
  - codedeploy
  - aws

security 3 Steps to Secure Your CI/CD Pipelines

Devops.com posted a topic in CI/CD, GitOps, Orchestration & Scheduling

Palo Alto Networks' Daniel Krivelevich shares a general three-step framework organizations can use to secure the CI/CD pipeline and surrounding areas. View the full article

hcp packer Creating a multi-cloud golden image pipeline with Terraform Cloud and HCP Packer

Hashicorp posted a topic in Infrastructure-as-Code

In today’s multi-cloud world, images (such as AMIs for Amazon EC2, virtual machines, Docker containers, and more) lay the foundation for modern infrastructure, security, networking, and applications. Enterprises adopting multi-cloud typically start by using Terraform for centralized provisioning, but Terraform does not handle the details of image creation and management. In many organizations, the workflows in place to create and manage images are siloed, time-consuming, and complex, leading to slow spin-up times and human errors that pose security risks. Organizations need standard processes to ensure all images throughout their infrastructure estate are secure, compliant, and easily accessible... View the full article

September 18, 2023
- golden images
- pipelines
- (and 3 more)
  Tagged with:

terraform Deploy Terraform using Azure DevOps YAML Pipelines

Build5Nines posted a topic in Infrastructure-as-Code

HashiCorp Terraform is a popular tool for managing infrastructure as code (IaC). By defining your IaC using Terraform, you can use version control with your infrastructure configuration and also automate infrastructure deployment in a consistent and repeatable way. Azure DevOps Pipelines can be used to setup YAML pipelines to instrument the Terraform infrastructure deployments using […] The article Deploy Terraform using Azure DevOps YAML Pipelines appeared first on Build5Nines. View the full article

February 17, 2023
- azure devops
- azure
- (and 2 more)
  Tagged with:
  - azure devops
  - azure
  - yaml
  - pipelines

Deploy C# Script Function App using Azure DevOps YAML Pipeline

Build5Nines posted a topic in CI/CD, GitOps, Orchestration & Scheduling

A powerful way to write and build Azure Functions is to use C# Script. This allows you to easily write function directly within the Azure Portal, or even using source control as simple .csx script files. You don’t need to pre-compile the function before deploying it to Azure and can easily edit and release changes. […] The article Deploy C# Script Function App using Azure DevOps YAML Pipeline appeared first on Build5Nines. View the full article

July 14, 2022
- csharp
- azure devops
- (and 2 more)
  Tagged with:
  - csharp
  - azure devops
  - yaml
  - pipelines

github actions Integrating with GitHub Actions – CI/CD pipeline to deploy a Web App to Amazon EC2

Amazon Web Services posted a topic in Amazon Web Services

Many Organizations adopt DevOps Practices to innovate faster by automating and streamlining the software development and infrastructure management processes. Beyond cultural adoption, DevOps also suggests following certain best practices and Continuous Integration and Continuous Delivery (CI/CD) is among the important ones to start with. CI/CD practice reduces the time it takes to release new software updates by automating deployment activities. Many tools are available to implement this practice. Although AWS has a set of native tools to help achieve your CI/CD goals, it also offers flexibility and extensibility for integrating with numerous third party tools. In this post, you will use GitHub Actions to create a CI/CD workflow and AWS CodeDeploy to deploy a sample Java SpringBoot application to Amazon Elastic Compute Cloud (Amazon EC2) instances in an Autoscaling group. GitHub Actions is a feature on GitHub’s popular development platform that helps you automate your software development workflows in the same place that you store code and collaborate on pull requests and issues. You can write individual tasks called actions, and then combine them to create a custom workflow. Workflows are custom automated processes that you can set up in your repository to build, test, package, release, or deploy any code project on GitHub. AWS CodeDeploy is a deployment service that automates application deployments to Amazon EC2 instances, on-premises instances, serverless AWS Lambda functions, or Amazon Elastic Container Service (Amazon ECS) services. Solution Overview The solution utilizes the following services: GitHub Actions – Workflow Orchestration tool that will host the Pipeline. AWS CodeDeploy – AWS service to manage deployment on Amazon EC2 Autoscaling Group. AWS Auto Scaling – AWS Service to help maintain application availability and elasticity by automatically adding or removing Amazon EC2 instances. Amazon EC2 – Destination Compute server for the application deployment. AWS CloudFormation – AWS infrastructure as code (IaC) service used to spin up the initial infrastructure on AWS side. IAM OIDC identity provider – Federated authentication service to establish trust between GitHub and AWS to allow GitHub Actions to deploy on AWS without maintaining AWS Secrets and credentials. Amazon Simple Storage Service (Amazon S3) – Amazon S3 to store the deployment artifacts. The following diagram illustrates the architecture for the solution: Developer commits code changes from their local repo to the GitHub repository. In this post, the GitHub action is triggered manually, but this can be automated. GitHub action triggers the build stage. GitHub’s Open ID Connector (OIDC) uses the tokens to authenticate to AWS and access resources. GitHub action uploads the deployment artifacts to Amazon S3. GitHub action invokes CodeDeploy. CodeDeploy triggers the deployment to Amazon EC2 instances in an Autoscaling group. CodeDeploy downloads the artifacts from Amazon S3 and deploys to Amazon EC2 instances. Prerequisites Before you begin, you must complete the following prerequisites: An AWS account with permissions to create the necessary resources. A GitHub account with permissions to configure GitHub repositories, create workflows, and configure GitHub secrets. A Git client to clone the provided source code. Steps The following steps provide a high-level overview of the walkthrough: Clone the project from the AWS code samples repository. Deploy the AWS CloudFormation template to create the required services. Update the source code. Setup GitHub secrets. Integrate CodeDeploy with GitHub. Trigger the GitHub Action to build and deploy the code. Verify the deployment. Download the source code Clone the source code repository aws-codedeploy-github-actions-deployment. git clone https://github.com/aws-samples/aws-codedeploy-github-actions-deployment.git Create an empty repository in your personal GitHub account. To create a GitHub repository, see Create a repo. Clone this repo to your computer. Furthermore, ignore the warning about cloning an empty repository. git clone https://github.com/<username>/<repoName>.git Copy the code. We need contents from the hidden .github folder for the GitHub actions to work. cp -r aws-codedeploy-github-actions-deployment/. <new repository> e.g. GitActionsDeploytoAWS Now you should have the following folder structure in your local repository. Repository folder structure The .github folder contains actions defined in the YAML file. The aws/scripts folder contains code to run at the different deployment lifecycle events. The cloudformation folder contains the template.yaml file to create the required AWS resources. Spring-boot-hello-world-example is a sample application used by GitHub actions to build and deploy. Root of the repo contains appspec.yml. This file is required by CodeDeploy to perform deployment on Amazon EC2. Find more details here. The following commands will help make sure that your remote repository points to your personal GitHub repository. git remote remove origin git remote add origin <your repository url> git branch -M main git push -u origin main Deploy the CloudFormation template To deploy the CloudFormation template, complete the following steps: Open AWS CloudFormation console. Enter your account ID, user name, and Password. Check your region, as this solution uses us-east-1. If this is a new AWS CloudFormation account, select Create New Stack. Otherwise, select Create Stack. Select Template is Ready Select Upload a template file Select Choose File. Navigate to template.yml file in your cloned repository at “aws-codedeploy-github-actions-deployment/cloudformation/template.yaml”. Select the template.yml file, and select next. In Specify Stack Details, add or modify the values as needed. Stack name = CodeDeployStack. VPC and Subnets = (these are pre-populated for you) you can change these values if you prefer to use your own Subnets) GitHubThumbprintList = 6938fd4d98bab03faadb97b34396831e3780aea1 GitHubRepoName – Name of your GitHub personal repository which you created. On the Options page, select Next. Select the acknowledgement box to allow for the creation of IAM resources, and then select Create. It will take CloudFormation approximately 10 minutes to create all of the resources. This stack would create the following resources. Two Amazon EC2 Linux instances with Tomcat server and CodeDeploy agent are installed Autoscaling group with Internet Application load balancer CodeDeploy application name and deployment group Amazon S3 bucket to store build artifacts Identity and Access Management (IAM) OIDC identity provider Instance profile for Amazon EC2 Service role for CodeDeploy Security groups for ALB and Amazon EC2 Update the source code On the AWS CloudFormation console, select the Outputs tab. Note that the Amazon S3 bucket name and the ARM of the GitHub IAM Role. We will use this in the next step. Update the Amazon S3 bucket in the workflow file deploy.yml. Navigate to /.github/workflows/deploy.yml from your Project root directory. Replace ##s3-bucket## with the name of the Amazon S3 bucket created previously. Replace ##region## with your AWS Region. Update the Amazon S3 bucket name in after-install.sh. Navigate to aws/scripts/after-install.sh. This script would copy the deployment artifact from the Amazon S3 bucket to the tomcat webapps folder. Remember to save all of the files and push the code to your GitHub repo. Verify that you’re in your git repository folder by running the following command: git remote -V You should see your remote branch address, which is similar to the following: username@3c22fb075f8a GitActionsDeploytoAWS % git remote -v origin git@github.com:<username>/GitActionsDeploytoAWS.git (fetch) origin git@github.com:<username>/GitActionsDeploytoAWS.git (push) Now run the following commands to push your changes: git add . git commit -m “Initial commit” git push Setup GitHub Secrets The GitHub Actions workflows must access resources in your AWS account. Here we are using IAM OpenID Connect identity provider and IAM role with IAM policies to access CodeDeploy and Amazon S3 bucket. OIDC lets your GitHub Actions workflows access resources in AWS without needing to store the AWS credentials as long-lived GitHub secrets. These credentials are stored as GitHub secrets within your GitHub repository, under Settings > Secrets. For more information, see “GitHub Actions secrets”. Navigate to your github repository. Select the Settings tab. Select Secrets on the left menu bar. Select New repository secret. Select Actions under Secrets. Enter the secret name as ‘IAMROLE_GITHUB’. enter the value as ARN of GitHubIAMRole, which you copied from the CloudFormation output section. Integrate CodeDeploy with GitHub For CodeDeploy to be able to perform deployment steps using scripts in your repository, it must be integrated with GitHub. CodeDeploy application and deployment group are already created for you. Please use these applications in the next step: CodeDeploy Application =CodeDeployAppNameWithASG Deployment group = CodeDeployGroupName To link a GitHub account to an application in CodeDeploy, follow until step 10 from the instructions on this page. You can cancel the process after completing step 10. You don’t need to create Deployment. Trigger the GitHub Actions Workflow Now you have the required AWS resources and configured GitHub to build and deploy the code to Amazon EC2 instances. The GitHub actions as defined in the GITHUBREPO/.github/workflows/deploy.yml would let us run the workflow. The workflow is currently setup to be manually run. Follow the following steps to run it manually. Go to your GitHub Repo and select Actions tab Select Build and Deploy link, and select Run workflow as shown in the following image. After a few seconds, the workflow will be displayed. Then, select Build and Deploy. You will see two stages: Build and Package. Deploy. Build and Package The Build and Package stage builds the sample SpringBoot application, generates the war file, and then uploads it to the Amazon S3 bucket. You should be able to see the war file in the Amazon S3 bucket. Deploy In this stage, workflow would invoke the CodeDeploy service and trigger the deployment. Verify the deployment Log in to the AWS Console and navigate to the CodeDeploy console. Select the Application name and deployment group. You will see the status as Succeeded if the deployment is successful. Point your browsers to the URL of the Application Load balancer. Note: You can get the URL from the output section of the CloudFormation stack or Amazon EC2 console Load Balancers. Optional – Automate the deployment on Git Push Workflow can be automated by changing the following line of code in your .github/workflow/deploy.yml file. From workflow_dispatch: {} To #workflow_dispatch: {} push: branches: [ main ] pull_request: This will be interpreted by GitHub actions to automaticaly run the workflows on every push or pull requests done on the main branch. After testing end-to-end flow manually, you can enable the automated deployment. Clean up To avoid incurring future changes, you should clean up the resources that you created. Empty the Amazon S3 bucket: Delete the CloudFormation stack (CodeDeployStack) from the AWS console. Delete the GitHub Secret (‘IAMROLE_GITHUB’) Go to the repository settings on GitHub Page. Select Secrets under Actions. Select IAMROLE_GITHUB, and delete it. Conclusion In this post, you saw how to leverage GitHub Actions and CodeDeploy to securely deploy Java SpringBoot application to Amazon EC2 instances behind AWS Autoscaling Group. You can further add other stages to your pipeline, such as Test and security scanning. Additionally, this solution can be used for other programming languages. About the Authors Mahesh Biradar is a Solutions Architect at AWS. He is a DevOps enthusiast and enjoys helping customers implement cost-effective architectures that scale. Suresh Moolya is a Cloud Application Architect with Amazon Web Services. He works with customers to architect, design, and automate business software at scale on AWS cloud. View the full article

March 29, 2022
- ci/cd
- pipelines
- (and 2 more)
  Tagged with:
  - ci/cd
  - pipelines
  - web apps
  - amazon ec2

EC2 Image Builder now supports AMI distribution across AWS accounts

Amazon Web Services posted a topic in Amazon Web Services

Customers can now share AMIs from Image Builder pipelines with AWS accounts in multiple AWS regions, using the AWS Command Line Interface (CLI). View the full article

October 28, 2020
1 reply
- aws
- ec2
- (and 4 more)
  Tagged with:
  - aws
  - ec2
  - image builder
  - ami
  - pipelines
  - cli

OpenShift Pipelines and OpenShift GitOps are now Generally Available

Red Hat posted a topic in Kubernetes & Container Orchestration

The need to deliver applications faster and with better quality is widespread across all industries and keeps increasing every year (CNCF Cloud Native Survey 2020). OpenShift, as the enterprise Kubernetes platform for developers, is sharply focused on enabling organizations to automate application delivery through DevOps practices such as continuous integration and continuous delivery (CI/CD). We are excited to announce the general availability of OpenShift Pipelines and OpenShift GitOps as the foundation of cloud-native CI/CD and GitOps on Red Hat OpenShift Container Platform. OpenShift Pipelines provides a cloud-native continuous integration solution based on Tekton, a Continuous Delivery Foundation (CDF) project. OpenShift GitOps enables GitOps workflows for application deployments and configuration of applications and Kubernetes clusters through Argo CD, a Cloud Native Computing Foundation (CNCF) project. OpenShift Pipelines: Cloud-Native Continuous Integration Tekton is the core of OpenShift Pipelines and provides a Kubernetes-native framework for creating pipelines that automate the delivery of applications and run native as pods on the cluster. Tekton is built on top of Kubernetes concepts, an operational model that significantly reduces the operational overhead of continuous integration infrastructure for organizations when combined with the serverless execution model. Running pipelines in isolated pods on OpenShift with no central shared server allows teams to own their delivery pipelines without risking conflict or undesired dependencies among teams. Building on Tekton, OpenShift Pipelines assists developers by providing pipeline blueprints that are automatically created when importing applications to the OpenShift platform. These blueprints are created by admins based on their organization's unique business and security requirements and delivered to development teams through the OpenShift Console. In addition, developers can use the pipeline builder to compose advanced CI workflows for their applications. OpenShift Pipelines provide a curated list of Tekton ClusterTasks for use when authoring pipelines manually or through the graphical pipeline builder for common CI tasks such as performing Git commands, building container images from application source, and pushing image registries, to name a few. Using pipelines in OpenShift has never been easier with the native integration with the OpenShift Console allowing developers to configure webhooks, execute pipelines on code changes, and view results and logs directly alongside their applications. Additionally, the pipeline logs are made available in OpenShift Logging and are aggregated with the platform logs for audit and other purposes. Developers using command line and IDEs can take advantage of Tekton CLI, the Tekton extension for Visual Studio Code and Tekton plug-in for IntelliJ, to interact with pipelines without leaving their environment and create, start, view, and perform actions on the cluster directly from the command line. Tekton Hub provides a central hub for finding reusable Tekton Tasks when authoring pipelines. Developers can use Tekton CLI, Tekton extension for Visual Studio Code, and Tekton plug-in for IntelliJ to search for Tasks in Tekton Hub directly from the command line or IDE and install the tasks on the cluster for use within their pipelines. OpenShift Pipelines now includes the following new capabilities: Pipeline log aggregation in the OpenShift Logging central log management Automatic proxy configuration on TaskRuns Enabling TLS for EventListeners pods ClusterTriggerBindings for BitBucket and GitLab Jenkins-to-Tekton migration guide Log aggregation in the OpenShift Logging central log management Authentication integration guidance with Red Hat SSO and OpenShift Dynamic generation of Argo CD Applications with ApplicationSets (Tech Preview) Collection of Argo CD metrics through the OpenShift monitoring stack Prometheus OutOfSync alerts in OpenShift monitoring stack AlertManager OpenShift GitOps: Continuous Delivery With GitOps Git has been at the center of software development for a long time, and many teams have adopted the Git pull-request workflow for developing code. GitOps is an approach to continuous delivery (CD) and treats Git as the single source of truth for everything, including infrastructure, platform, and application configurations. Teams can then take advantage of Git workflows to drive cluster operations and application delivery to enable predictable, more secure, and repeatable changes to clusters. At the same time, observability and visibility of the actual state are increased, and possible configuration drifts can be detected easily and immediately through the GitOps workflow. GitOps allows for maintaining full transparency through Git audit capabilities and provides a straightforward mechanism to roll back to any desired version across multiple OpenShift and Kubernetes clusters. OpenShift GitOps is built around Argo CD as the declarative GitOps engine that enables GitOps workflows across multicluster OpenShift and Kubernetes infrastructure. Using Argo CD, teams can sync the state of OpenShift and Kubernetes clusters and applications with the content of the Git repositories manually or automatically. Argo CD continuously compares the state of the clusters and the Git repositories to identify any drift and can automatically bring the cluster back to the desired state if any change is detected on the Git repository or the cluster. The auto-healing capabilities in Argo CD increase the security of the CD workflow by preventing undesired, unauthorized, or unvetted changes that might be performed directly on the cluster unintentionally or through security breaches. Once OpenShift GitOps is enabled on the cluster through the OperatorHub, the default Argo CD dashboard can be accessed through the application launcher in OpenShift Console. The default Argo CD instance is configured with sufficient privileges to drive cluster configuration management such as installing operators from OperatorHub and configuring cluster and OperatorHub operators, user roles and access (RBAC), storage, and more. As organizations adopt DevOps values and culture, many opt for sharing application delivery responsibilities with development teams and enable them to own the delivery of their applications. OpenShift GitOps enables these organizations to deploy namespace-scoped instances of Argo CD through the Developer Catalog and transfer the ownership of these instances to the development teams. Namespace-scoped Argo CD instances are configured by default to restrict deployments and configuration of resources only within the namespaces accessible to the particular development team and prohibits changes to the cluster configurations or other namespaces. Additional privileges may be granted to each Argo CD instance if desired by the cluster admins. Furthermore, Argo CD’s flexible deployment topology adapts to the organization’s GitOps process and can act as a central hub for pushing changes from Git repositories to remote OpenShift and Kubernetes clusters on public cloud (EKS, AKS, and GCP) as well as pulling changes into the cluster that it is running on. This eliminates the need for a central layer to be aware of the cluster fleet within the organization. In addition to Argo CD, OpenShift GitOps provides an opinionated GitOps workflow based on Tekton (provided through OpenShift Pipelines), Argo CD, and Kustomize, which is bootstrapped by the GitOps Application Manager CLI and is included as a Tech Preview feature. GitOps Application Manager CLI populates a configuration Git repository with the Kubernetes manifests for the application across its environments and uses the Git workflows for promoting the application throughout its life cycle to the next environment. Organizations that want to expand their GitOps workflow to cluster life-cycle management, policy and compliance can take advantage of Red Hat Advanced Cluster Manager for Kubernetes, which uses OpenShift GitOps to support GitOps workflows for cluster management operations. OpenShift GitOps now includes the following capabilities: Log aggregation in the OpenShift Logging central log management Authentication integration guidance with Red Hat SSO and OpenShift Dynamic generation of Argo CD Applications with ApplicationSets (Tech Preview) Collection of Argo CD metrics through the OpenShift monitoring stack Prometheus OutOfSync alerts in OpenShift monitoring stack AlertManager Tekton and Argo CD Communities At Red Hat, we believe in creating better technology through the open source model and the innovation that is driven out of the open source communities. Red Hat continues to be an active participant in the Tekton and Argo CD communities and collaborates with other contributors to drive these projects forward as core technologies that power the cloud-native continuous integration and continuous delivery on OpenShift. To try out OpenShift Pipelines and OpenShift GitOps, try http://learn.openshift.com/gitops. View the full article

May 3, 2021
- openshift
- pipelines
- (and 1 more)
  Tagged with:

Micro ETL pipeline: How to fetch, process, and refresh small data using AWS Lambda and AWS SAM

Amazon Web Services posted a topic in Development & Programming

Whether you are building a data lake, a data analytics pipeline, or a simple data feed, you may have small volumes of data that need to be processed and refreshed regularly. This post shows how you can build and deploy a micro extract, transform, and load (ETL) pipeline to handle this requirement. In addition, you configure a reusable Python environment to build and deploy micro ETL pipelines using your source of data... View the full article

April 26, 2021
- etl
- pipelines
- (and 2 more)
  Tagged with:
  - etl
  - pipelines
  - lambda
  - sam

Why Secrets Management is Critical to DevOps Pipeline Security

Devops.com posted a topic in DevOps & SRE General Discussion

While valuable, secrets management also can be difficult for DevOps teams to employ. Here’s what you need to know Business is all about speed. Companies want to innovate and deliver functionality faster to remain competitive. This explains the increasing popularity of DevOps as a go-to model for rapid application delivery. A recent Gartner report indicated […] The post Why Secrets Management is Critical to DevOps Pipeline Security appeared first on DevOps.com. View the full article

Sign In

Search the Community

Search By Tags

Search By Author

Content Type

Forums

Calendars

Find results in...

Find results that contain...

Date Created

Start

End

Last Updated

Start

End

Filter by number of...

Minimum number of comments

Minimum number of replies

Minimum number of reviews

Minimum number of views

Joined

Start

End

Group

Website URL

LinkedIn Profile URL

About Me

Cloud Platforms

Cloud Experience

Development Experience

Current Role

Skills

Certifications

Favourite Tools

Interests

Forum Statistics