Jump to content

Search the Community

Showing results for tags 'frameworks'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • General
    • General Discussion
    • Artificial Intelligence
    • DevOpsForum News
  • DevOps & SRE
    • DevOps & SRE General Discussion
    • Databases, Data Engineering & Data Science
    • Development & Programming
    • CI/CD, GitOps, Orchestration & Scheduling
    • Docker, Containers, Microservices, Serverless & Virtualization
    • Infrastructure-as-Code
    • Kubernetes & Container Orchestration
    • Linux
    • Logging, Monitoring & Observability
    • Security, Governance, Risk & Compliance
  • Cloud Providers
    • Amazon Web Services
    • Google Cloud Platform
    • Microsoft Azure

Calendars

  • DevOps Events

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Cloud Platforms


Cloud Experience


Development Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

Found 17 results

  1. In today’s AI era, data is your competitive edge. There has never been a more exciting time in technology, with AI creating entirely new ways to solve problems, engage customers, and work more efficiently. However, most enterprises still struggle with siloed data, which stifles innovation, keeps vital insights locked away, and can reduce the value AI has across the business. Discover a faster, smarter way to innovate Google Cloud Cortex Framework accelerates your ability to unify enterprise data for connected insights, and provides new opportunities for AI to transform customer experiences, boost revenue, and reduce costs which can otherwise be hidden in your company’s data and applications. Built on an AI-ready Data Cloud foundation, Cortex Framework includes what you need to design, build, and deploy solutions for specific business problems and opportunities including endorsed reference architectures and packaged business solution deployment content. In this blog, we provide an overview of Cortex Framework, and highlight some recent enhancements. Get a connected view of your business with one data foundation Cortex Framework enables one data foundation for businesses by bridging and enriching private, public, and community insights for deeper analysis. Our latest release extends Cortex Data Foundation with new data and AI solutions for enterprise data sources including Salesforce Marketing Cloud, Meta, SAP ERP, and Dun & Bradstreet. Together this data unlocks insights across the enterprise and opens up opportunities for optimization and innovation. New intelligent marketing use cases Drive more intelligent marketing strategies with one data foundation for your enterprise data, including integrated sources like Google Ads, Campaign Manager 360, TikTok and now — Salesforce Marketing Cloud and Meta connectivity with BigQuery via Cortex Framework, with predefined data ingestion templates, data models and sample dashboards. Together with other business data available like sales and supply chain sources in Cortex Data Foundation, you can accelerate insights and answer questions like: How does my overall campaign and audience performance relate to sales and supply chain? New sustainability management use cases Want more timely insights into environment, social and governance (ESG) risks and opportunities? You can now manage ESG performance and goals with new vendor ESG performance insights using Dun & Bradstreet ESG ranking data connected with your SAP ERP supplier data. Now with predefined data ingestion templates, data models and a sample dashboard focused on sustainability insights, for informed decision making. Answer questions like: “What is my raw material suppliers' ESG performance against industry peers?” “What is their ability to measure and manage GHG emissions?” and “What is their adherence and commitment to environmental compliance and corporate governance?” New simplified finance use cases Simplify financial insights across the business to make informed decisions about liquidity, solvency, and financial flexibility to feed into strategic growth investment opportunities — now with predefined data ingestion templates, data models and sample dashboards to help you discern new insights with balance sheet and income statement reporting on SAP ERP data. Accelerate AI innovation with secured data access To help organizations build out a data mesh for more optimized data discovery, access control and governance when interacting with Cortex Data Foundation, our new solution content offers a metadata framework built on BigQuery and Dataplex that: Organizes Cortex Data Foundation pre-defined data models into business domains Augments Cortex Data Foundation tables, views and columns with semantic context to empower search and discovery of data assets Enables natural language to SQL capabilities by providing logical context for Cortex Data Foundation content to LLMs and gen AI applications Annotates data access policies to enable consistent enforcement of access controls and masking of sensitive columns With a data mesh in place, Cortex Data Foundation models can allow for more efficiency in generative AI search and discovery, as well as fine-grained access policies and governance. Data and AI brings next-level innovation and efficiency Will your business lead the way? Learn more about our portfolio of solutions by tuning in to our latest Next ‘24 session - ANA107 and checking out our website. View the full article
  2. MattKC painstakingly ports .NET to Windows 95 from Windows 98, enabling many applications that would not otherwise work. View the full article
  3. Unlock the power of your data with an effective data governance framework for security, compliance, and decision-making.View the full article
  4. Graphic created by Kevon Mayers Introduction Organizations often use Terraform Modules to orchestrate complex resource provisioning and provide a simple interface for developers to enter the required parameters to deploy the desired infrastructure. Modules enable code reuse and provide a method for organizations to standardize deployment of common workloads such as a three-tier web application, a cloud networking environment, or a data analytics pipeline. When building Terraform modules, it is common for the module author to start with manual testing. Manual testing is performed using commands such as terraform validate for syntax validation, terraform plan to preview the execution plan, and terraform apply followed by manual inspection of resource configuration in the AWS Management Console. Manual testing is prone to human error, not scalable, and can result in unintended issues. Because modules are used by multiple teams in the organization, it is important to ensure that any changes to the modules are extensively tested before the release. In this blog post, we will show you how to validate Terraform modules and how to automate the process using a Continuous Integration/Continuous Deployment (CI/CD) pipeline. Terraform Test Terraform test is a new testing framework for module authors to perform unit and integration tests for Terraform modules. Terraform test can create infrastructure as declared in the module, run validation against the infrastructure, and destroy the test resources regardless if the test passes or fails. Terraform test will also provide warnings if there are any resources that cannot be destroyed. Terraform test uses the same HashiCorp Configuration Language (HCL) syntax used to write Terraform modules. This reduces the burden for modules authors to learn other tools or programming languages. Module authors run the tests using the command terraform test which is available on Terraform CLI version 1.6 or higher. Module authors create test files with the extension *.tftest.hcl. These test files are placed in the root of the Terraform module or in a dedicated tests directory. The following elements are typically present in a Terraform tests file: Provider block: optional, used to override the provider configuration, such as selecting AWS region where the tests run. Variables block: the input variables passed into the module during the test, used to supply non-default values or to override default values for variables. Run block: used to run a specific test scenario. There can be multiple run blocks per test file, Terraform executes run blocks in order. In each run block you specify the command Terraform (plan or apply), and the test assertions. Module authors can specify the conditions such as: length(var.items) != 0. A full list of condition expressions can be found in the HashiCorp documentation. Terraform tests are performed in sequential order and at the end of the Terraform test execution, any failed assertions are displayed. Basic test to validate resource creation Now that we understand the basic anatomy of a Terraform tests file, let’s create basic tests to validate the functionality of the following Terraform configuration. This Terraform configuration will create an AWS CodeCommit repository with prefix name repo-. # main.tf variable "repository_name" { type = string } resource "aws_codecommit_repository" "test" { repository_name = format("repo-%s", var.repository_name) description = "Test repository." } Now we create a Terraform test file in the tests directory. See the following directory structure as an example: ├── main.tf └── tests └── basic.tftest.hcl For this first test, we will not perform any assertion except for validating that Terraform execution plan runs successfully. In the tests file, we create a variable block to set the value for the variable repository_name. We also added the run block with command = plan to instruct Terraform test to run Terraform plan. The completed test should look like the following: # basic.tftest.hcl variables { repository_name = "MyRepo" } run "test_resource_creation" { command = plan } Now we will run this test locally. First ensure that you are authenticated into an AWS account, and run the terraform init command in the root directory of the Terraform module. After the provider is initialized, start the test using the terraform test command. ❯ terraform test tests/basic.tftest.hcl... in progress run "test_resource_creation"... pass tests/basic.tftest.hcl... tearing down tests/basic.tftest.hcl... pass Our first test is complete, we have validated that the Terraform configuration is valid and the resource can be provisioned successfully. Next, let’s learn how to perform inspection of the resource state. Create resource and validate resource name Re-using the previous test file, we add the assertion block to checks if the CodeCommit repository name starts with a string repo- and provide error message if the condition fails. For the assertion, we use the startswith function. See the following example: # basic.tftest.hcl variables { repository_name = "MyRepo" } run "test_resource_creation" { command = plan assert { condition = startswith(aws_codecommit_repository.test.repository_name, "repo-") error_message = "CodeCommit repository name ${var.repository_name} did not start with the expected value of ‘repo-****’." } } Now, let’s assume that another module author made changes to the module by modifying the prefix from repo- to my-repo-. Here is the modified Terraform module. # main.tf variable "repository_name" { type = string } resource "aws_codecommit_repository" "test" { repository_name = format("my-repo-%s", var.repository_name) description = "Test repository." } We can catch this mistake by running the the terraform test command again. ❯ terraform test tests/basic.tftest.hcl... in progress run "test_resource_creation"... fail ╷ │ Error: Test assertion failed │ │ on tests/basic.tftest.hcl line 9, in run "test_resource_creation": │ 9: condition = startswith(aws_codecommit_repository.test.repository_name, "repo-") │ ├──────────────── │ │ aws_codecommit_repository.test.repository_name is "my-repo-MyRepo" │ │ CodeCommit repository name MyRepo did not start with the expected value 'repo-***'. ╵ tests/basic.tftest.hcl... tearing down tests/basic.tftest.hcl... fail Failure! 0 passed, 1 failed. We have successfully created a unit test using assertions that validates the resource name matches the expected value. For more examples of using assertions see the Terraform Tests Docs. Before we proceed to the next section, don’t forget to fix the repository name in the module (revert the name back to repo- instead of my-repo-) and re-run your Terraform test. Testing variable input validation When developing Terraform modules, it is common to use variable validation as a contract test to validate any dependencies / restrictions. For example, AWS CodeCommit limits the repository name to 100 characters. A module author can use the length function to check the length of the input variable value. We are going to use Terraform test to ensure that the variable validation works effectively. First, we modify the module to use variable validation. # main.tf variable "repository_name" { type = string validation { condition = length(var.repository_name) <= 100 error_message = "The repository name must be less than or equal to 100 characters." } } resource "aws_codecommit_repository" "test" { repository_name = format("repo-%s", var.repository_name) description = "Test repository." } By default, when variable validation fails during the execution of Terraform test, the Terraform test also fails. To simulate this, create a new test file and insert the repository_name variable with a value longer than 100 characters. # var_validation.tftest.hcl variables { repository_name = “this_is_a_repository_name_longer_than_100_characters_7rfD86rGwuqhF3TH9d3Y99r7vq6JZBZJkhw5h4eGEawBntZmvy” } run “test_invalid_var” { command = plan } Notice on this new test file, we also set the command to Terraform plan, why is that? Because variable validation runs prior to Terraform apply, thus we can save time and cost by skipping the entire resource provisioning. If we run this Terraform test, it will fail as expected. ❯ terraform test tests/basic.tftest.hcl… in progress run “test_resource_creation”… pass tests/basic.tftest.hcl… tearing down tests/basic.tftest.hcl… pass tests/var_validation.tftest.hcl… in progress run “test_invalid_var”… fail ╷ │ Error: Invalid value for variable │ │ on main.tf line 1: │ 1: variable “repository_name” { │ ├──────────────── │ │ var.repository_name is “this_is_a_repository_name_longer_than_100_characters_7rfD86rGwuqhF3TH9d3Y99r7vq6JZBZJkhw5h4eGEawBntZmvy” │ │ The repository name must be less than or equal to 100 characters. │ │ This was checked by the validation rule at main.tf:3,3-13. ╵ tests/var_validation.tftest.hcl… tearing down tests/var_validation.tftest.hcl… fail Failure! 1 passed, 1 failed. For other module authors who might iterate on the module, we need to ensure that the validation condition is correct and will catch any problems with input values. In other words, we expect the validation condition to fail with the wrong input. This is especially important when we want to incorporate the contract test in a CI/CD pipeline. To prevent our test from failing due introducing an intentional error in the test, we can use the expect_failures attribute. Here is the modified test file: # var_validation.tftest.hcl variables { repository_name = “this_is_a_repository_name_longer_than_100_characters_7rfD86rGwuqhF3TH9d3Y99r7vq6JZBZJkhw5h4eGEawBntZmvy” } run “test_invalid_var” { command = plan expect_failures = [ var.repository_name ] } Now if we run the Terraform test, we will get a successful result. ❯ terraform test tests/basic.tftest.hcl… in progress run “test_resource_creation”… pass tests/basic.tftest.hcl… tearing down tests/basic.tftest.hcl… pass tests/var_validation.tftest.hcl… in progress run “test_invalid_var”… pass tests/var_validation.tftest.hcl… tearing down tests/var_validation.tftest.hcl… pass Success! 2 passed, 0 failed. As you can see, the expect_failures attribute is used to test negative paths (the inputs that would cause failures when passed into a module). Assertions tend to focus on positive paths (the ideal inputs). For an additional example of a test that validates functionality of a completed module with multiple interconnected resources, see this example in the Terraform CI/CD and Testing on AWS Workshop. Orchestrating supporting resources In practice, end-users utilize Terraform modules in conjunction with other supporting resources. For example, a CodeCommit repository is usually encrypted using an AWS Key Management Service (KMS) key. The KMS key is provided by end-users to the module using a variable called kms_key_id. To simulate this test, we need to orchestrate the creation of the KMS key outside of the module. In this section we will learn how to do that. First, update the Terraform module to add the optional variable for the KMS key. # main.tf variable "repository_name" { type = string validation { condition = length(var.repository_name) <= 100 error_message = "The repository name must be less than or equal to 100 characters." } } variable "kms_key_id" { type = string default = "" } resource "aws_codecommit_repository" "test" { repository_name = format("repo-%s", var.repository_name) description = "Test repository." kms_key_id = var.kms_key_id != "" ? var.kms_key_id : null } In a Terraform test, you can instruct the run block to execute another helper module. The helper module is used by the test to create the supporting resources. We will create a sub-directory called setup under the tests directory with a single kms.tf file. We also create a new test file for KMS scenario. See the updated directory structure: ├── main.tf └── tests ├── setup │ └── kms.tf ├── basic.tftest.hcl ├── var_validation.tftest.hcl └── with_kms.tftest.hcl The kms.tf file is a helper module to create a KMS key and provide its ARN as the output value. # kms.tf resource "aws_kms_key" "test" { description = "test KMS key for CodeCommit repo" deletion_window_in_days = 7 } output "kms_key_id" { value = aws_kms_key.test.arn } The new test will use two separate run blocks. The first run block (setup) executes the helper module to generate a KMS key. This is done by assigning the command apply which will run terraform apply to generate the KMS key. The second run block (codecommit_with_kms) will then use the KMS key ARN output of the first run as the input variable passed to the main module. # with_kms.tftest.hcl run "setup" { command = apply module { source = "./tests/setup" } } run "codecommit_with_kms" { command = apply variables { repository_name = "MyRepo" kms_key_id = run.setup.kms_key_id } assert { condition = aws_codecommit_repository.test.kms_key_id != null error_message = "KMS key ID attribute value is null" } } Go ahead and run the Terraform init, followed by Terraform test. You should get the successful result like below. ❯ terraform test tests/basic.tftest.hcl... in progress run "test_resource_creation"... pass tests/basic.tftest.hcl... tearing down tests/basic.tftest.hcl... pass tests/var_validation.tftest.hcl... in progress run "test_invalid_var"... pass tests/var_validation.tftest.hcl... tearing down tests/var_validation.tftest.hcl... pass tests/with_kms.tftest.hcl... in progress run "create_kms_key"... pass run "codecommit_with_kms"... pass tests/with_kms.tftest.hcl... tearing down tests/with_kms.tftest.hcl... pass Success! 4 passed, 0 failed. We have learned how to run Terraform test and develop various test scenarios. In the next section we will see how to incorporate all the tests into a CI/CD pipeline. Terraform Tests in CI/CD Pipelines Now that we have seen how Terraform Test works locally, let’s see how the Terraform test can be leveraged to create a Terraform module validation pipeline on AWS. The following AWS services are used: AWS CodeCommit – a secure, highly scalable, fully managed source control service that hosts private Git repositories. AWS CodeBuild – a fully managed continuous integration service that compiles source code, runs tests, and produces ready-to-deploy software packages. AWS CodePipeline – a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. Amazon Simple Storage Service (Amazon S3) – an object storage service offering industry-leading scalability, data availability, security, and performance. Terraform module validation pipeline In the above architecture for a Terraform module validation pipeline, the following takes place: A developer pushes Terraform module configuration files to a git repository (AWS CodeCommit). AWS CodePipeline begins running the pipeline. The pipeline clones the git repo and stores the artifacts to an Amazon S3 bucket. An AWS CodeBuild project configures a compute/build environment with Checkov installed from an image fetched from Docker Hub. CodePipeline passes the artifacts (Terraform module) and CodeBuild executes Checkov to run static analysis of the Terraform configuration files. Another CodeBuild project configured with Terraform from an image fetched from Docker Hub. CodePipeline passes the artifacts (repo contents) and CodeBuild runs Terraform command to execute the tests. CodeBuild uses a buildspec file to declare the build commands and relevant settings. Here is an example of the buildspec files for both CodeBuild Projects: # Checkov version: 0.1 phases: pre_build: commands: - echo pre_build starting build: commands: - echo build starting - echo starting checkov - ls - checkov -d . - echo saving checkov output - checkov -s -d ./ > checkov.result.txt In the above buildspec, Checkov is run against the root directory of the cloned CodeCommit repository. This directory contains the configuration files for the Terraform module. Checkov also saves the output to a file named checkov.result.txt for further review or handling if needed. If Checkov fails, the pipeline will fail. # Terraform Test version: 0.1 phases: pre_build: commands: - terraform init - terraform validate build: commands: - terraform test In the above buildspec, the terraform init and terraform validate commands are used to initialize Terraform, then check if the configuration is valid. Finally, the terraform test command is used to run the configured tests. If any of the Terraform tests fails, the pipeline will fail. For a full example of the CI/CD pipeline configuration, please refer to the Terraform CI/CD and Testing on AWS workshop. The module validation pipeline mentioned above is meant as a starting point. In a production environment, you might want to customize it further by adding Checkov allow-list rules, linting, checks for Terraform docs, or pre-requisites such as building the code used in AWS Lambda. Choosing various testing strategies At this point you may be wondering when you should use Terraform tests or other tools such as Preconditions and Postconditions, Check blocks or policy as code. The answer depends on your test type and use-cases. Terraform test is suitable for unit tests, such as validating resources are created according to the naming specification. Variable validations and Pre/Post conditions are useful for contract tests of Terraform modules, for example by providing error warning when input variables value do not meet the specification. As shown in the previous section, you can also use Terraform test to ensure your contract tests are running properly. Terraform test is also suitable for integration tests where you need to create supporting resources to properly test the module functionality. Lastly, Check blocks are suitable for end to end tests where you want to validate the infrastructure state after all resources are generated, for example to test if a website is running after an S3 bucket configured for static web hosting is created. When developing Terraform modules, you can run Terraform test in command = plan mode for unit and contract tests. This allows the unit and contract tests to run quicker and cheaper since there are no resources created. You should also consider the time and cost to execute Terraform test for complex / large Terraform configurations, especially if you have multiple test scenarios. Terraform test maintains one or many state files within the memory for each test file. Consider how to re-use the module’s state when appropriate. Terraform test also provides test mocking, which allows you to test your module without creating the real infrastructure. Conclusion In this post, you learned how to use Terraform test and develop various test scenarios. You also learned how to incorporate Terraform test in a CI/CD pipeline. Lastly, we also discussed various testing strategies for Terraform configurations and modules. For more information about Terraform test, we recommend the Terraform test documentation and tutorial. To get hands on practice building a Terraform module validation pipeline and Terraform deployment pipeline, check out the Terraform CI/CD and Testing on AWS Workshop. Authors Kevon Mayers Kevon Mayers is a Solutions Architect at AWS. Kevon is a Terraform Contributor and has led multiple Terraform initiatives within AWS. Prior to joining AWS he was working as a DevOps Engineer and Developer, and before that was working with the GRAMMYs/The Recording Academy as a Studio Manager, Music Producer, and Audio Engineer. He also owns a professional production company, MM Productions. Welly Siauw Welly Siauw is a Principal Partner Solution Architect at Amazon Web Services (AWS). He spends his day working with customers and partners, solving architectural challenges. He is passionate about service integration and orchestration, serverless and artificial intelligence (AI) and machine learning (ML). He has authored several AWS blog posts and actively leads AWS Immersion Days and Activation Days. Welly spends his free time tinkering with espresso machines and outdoor hiking. View the full article
  5. Background Ever since generative AI gained prominence in the AI field, organizations ranging from startups to large enterprises have moved to harness its power by making it an integral part of their applications, solutions, and platforms. While the true potential of generative AI lies in creating new content based on learning from existing content, it is becoming important that the content produced has a degree of specificity to a given area or domain. This blog post shows how generative AI models can be adapted to your use cases by demonstrating how to train models on Google Kubernetes Engine (GKE) using NVIDIA accelerated computing and NVIDIA NeMo framework. Building generative AI models In the context of constructing generative AI models, high-quality data (the ‘dataset’), serves as a foundational element. Data in various formats, such as text, code, images, and others, is processed, enhanced, and analyzed to minimize direct effects on the model's output. Based on the model's modality, this data is fed into a model architecture to enable the model's training process. This might be text for Transformers or images for GANs (Generative Adversarial Networks). During the training process, the model adjusts its internal parameters so that its output matches the patterns and structures of the data. As the model learns, its performance is monitored by observing a lowering loss on the training set, as well as improved predictions on a test set. Once the performance is no longer improving, the model is considered converged. It may then under-go further refinement, such as reinforcement-learning with human feedback (RLHF). Additional hyperparameters, such as learning rate or batch size, can be tuned to improve the rate of model learning. The process of building and customizing a model can be expedited by utilizing a framework that offers the necessary constructs and tooling, thereby simplifying adoption. NVIDIA NeMo NVIDIA NeMo is an open-source, end-to-end platform purpose-built for developing custom, enterprise-grade generative AI models. NeMo leverages NVIDIA’s state-of-the-art technology to facilitate a complete workflow from automated distributed data processing to training of large-scale bespoke models and finally, to deploy and serve using infrastructure in Google Cloud. NeMo is also available for enterprise-grade deployments with NVIDIA AI Enterprise software, available on Google Cloud Marketplace. NeMo framework approaches building AI models using a modular design to encourage data scientists, ML engineers and developers to mix and match these core components: Data Curation: extract, deduplicate and filter information from datasets to generate high-quality training data Distributed Training: advanced parallelism of training models by spreading workloads across tens of thousands of compute nodes with NVIDIA graphics processing units (GPUs) Model Customization: adapt several foundational, pre-trained models to specific domains using techniques such as P-tuning, SFT (Supervised Fine Tuning), RLHF (Reinforcement Learning from Human Feedback) Deployment: seamless integration with NVIDIA Triton Inference Server to deliver high accuracy, low latency and high throughput results. NeMo framework provides guardrails to honor the safety and security requirements. It enables organizations to foster innovation, optimize operational efficiency, and establish easy access to software frameworks to start the generative AI journey. For those interested in deploying NeMo onto a HPC system that may include schedulers like the Slurm workload manager, we recommend using the ML Solution available through the Cloud HPC Toolkit. Training at scale using GKE Building and customizing models requires massive compute, quick access to memory and storage, and rapid networking. In addition, there are multiple demands across the infrastructure ranging from scaling large-sized models, efficient resource utilization, agility for faster iteration, fault tolerance and orchestrating distributed workloads. GKE allows customers to have a more consistent and robust development process by having one platform for all their workloads. GKE as a foundation platform provides unmatched scalability, compatibility with a diverse set of hardware accelerators including NVIDIA GPUs, bringing the best of accelerator orchestration to help significantly improve performance and reduce costs. Let’s look at how GKE helps manage the underlying infrastructure with ease with the help of Figure 1: Compute Multi-Instance GPUs (MIG): partition a single NVIDIA H100 or A100 Tensor Core GPU into multiple instances so each has high-bandwidth memory, cache and compute cores Time-sharing GPUs: single physical GPU node shared by multiple containers to efficiently use and save running costs Storage Local SSD: high throughput and I/O requirements GCS Fuse: allow file-like operations on objects Networking: GPUDirect-TCPX NCCL plug-in: Transport layer plugin to enable direct GPU to NIC transfers during NCCL communication, improving network performance. Google Virtual Network Interface Card (gVNIC): to increase network performance between GPU nodes Queuing: Kubernetes native job queueing system to orchestrate job execution to completion in a resource-constrained environment. GKE is widely embraced by communities including other ISVs (Independent Software Vendors) to land their tools, libraries and frameworks. GKE democratizes infrastructure by letting teams of different sizes to build, train and deploy AI models. Solution architecture The current industry trends in AI / ML space indicate that more computational power improves models significantly. GKE unleashes this very power of Google Cloud’s products and services along with NVIDIA GPUs to train and serve models with industry-leading scale. In Figure 2 above, the Reference Architecture illustrates the major components, tools and common services used to train the NeMo large language model using GKE. A GKE Cluster set up as a regional or zonal location consisting of two node pools; a default node pool to manage common services such as DNS pods, custom controllers and a managed node pool with the A3 nodes to run the workloads. A3 nodes, each with 16 local SSDs, 8x NVIDIA H100 Tensor Core GPUs and associated drivers. In each node, the CSI driver for Filestore CSI is enabled to access fully managed NFS storage and Cloud Storage FUSE to access Google Cloud Storage as a file system. Kueue batching for workload management. This is recommended for a larger setup used by multiple teams. Filestore mounted in each node to store outputs, interim, and final logs to view training performance. A Cloud Storage bucket that contains the training data. NVIDIA NGC hosts the NeMo framework’s training image. Training logs mounted on Filestore can be viewed using TensorBoard to examine the training loss and training step times. Common services such as Cloud Ops to view logs, IAM to manage security, and Terraform to deploy the entire setup. An end-to-end walkthrough is available in a GitHub repository at https://github.com/GoogleCloudPlatform/nvidia-nemo-on-gke. The walkthrough provides detailed instructions to set up the above solution in a Google Cloud Project and pre-train NVIDIAs’ NeMo Megatron GPT using the NeMo framework. Extend further In scenarios where there are massive amounts of structured data, BigQuery is commonly used by enterprises as their central data warehousing platform. There are techniques to export data into Cloud Storage to train the model. If the data is not available in the intended form, Dataflow can be used to read, transform and write back the data to BigQuery. Conclusion By leveraging GKE, organizations can focus on developing and deploying their models to grow their business, without worrying about the underlying infrastructure. NVIDIA NeMo is highly suited to building custom generative AI models. This combination provides the scalability, reliability, and ease of use to train and serve models. To learn more about GKE, click here. To understand more about NVIDIA NeMo, click here. If you are coming to NVIDIA GTC, please see Google Cloud at booth #808 to see this in action. View the full article
  6. Behave, a Python-based behavior-driven development (BDD) framework for writing human-readable tests that describe the expected behavior of software systems. On the other hand, Terraform is an infrastructure as code (IaC) tool that streamlines the management of infrastructure by enabling developers to define resources and configurations in a declarative manner. By combining Behave's BDD approach with Terraform, you can ensure that infrastructure behaves as expected under various conditions. This integration facilitates early detection of issues and the reliability of infrastructure code. Using Behave for Terraform Testing Testing Terraform configurations with Behave involves a series of structured steps: View the full article
  7. Today, Amazon Web Services (AWS) announced the launch of the AWS Well-Architected Framework DevOps Guidance. The AWS DevOps Guidance introduces the AWS DevOps Sagas—a collection of modern capabilities that together form a comprehensive approach to designing, developing, securing, and efficiently operating software at cloud scale. Taking the learnings from Amazon’s own transformation journey and our experience managing global cloud services, the AWS DevOps Guidance was built to equip organizations of all sizes with best practice culture, processes, and technical capabilities that help to deliver business value and applications more securely and at a higher velocity. A Glimpse into Amazon’s DevOps Transformation In the early 2000s, Amazon went through its own DevOps transformation which led to an online bookstore forming the AWS cloud computing division. Today, AWS provides a wide range of products and services for global customers that are powered by that same innovative DevOps approach. Due to the positive effects of this transformation, AWS recognizes the significance of DevOps and has been at the forefront of its adoption and implementation. Amazon’s own journey, along with the collective experience gained from assisting customers as they modernize and migrate to the cloud, provided insight into the capabilities which we believe make DevOps adoption successful. With these learnings, we created the DevOps Sagas to help our customers sustainably adopt and practice DevOps through the implementation of an interconnected set of capabilities. Each DevOps Saga includes prescriptive guidance for capabilities that provide indicators of success, metrics to measure, and common anti-patterns to avoid. Introducing The DevOps Sagas The DevOps Sagas are core domains within the software delivery process that collectively form AWS DevOps best practices. Together, they encompass a collection of modern capabilities representing a comprehensive approach to designing, developing, securing, and efficiently operating software at cloud scale. You can use the DevOps Sagas as a common definition of what DevOps means to your organization by aligning on a shared understanding within your organization and to consistently measure DevOps adoption over time. The 5 DevOps Sagas are: Organizational Adoption Saga: Inspires the formation of a customer-centric, adaptive culture focused on optimizing people-driven processes, personal and professional development, and improving developer experience to set the foundation for successful DevOps adoption. Development Lifecycle Saga: Aims to enhance the organization’s capacity to develop, review, and deploy workloads swiftly and securely. It leverages feedback loops, consistent deployment methods, and an ‘everything-as-code’ approach to attain efficiency in deployment. Quality Assurance Saga: Advocates for a proactive, test-first methodology integrated into the development process to ensure that applications are well-architected by design, secure, cost-efficient, sustainable, and delivered with increased agility through automation. Automated Governance Saga: Facilitates directive, detective, preventive, and responsive measures at all stages of the development process. It emphasizes risk management, business process adherence, and application and infrastructure compliance at scale through automated processes, policies, and guardrails. Observability Saga: Presents an approach to incorporating observability within environment and workloads, allowing teams to detect and address issues, improve performance, reduce costs, and ensure alignment with business objectives and customer needs. Who should use the AWS DevOps Guidance? We recognize that every organization is unique and that there is no one-size-fits-all approach to practicing DevOps. The recommendations and examples provided can be tailored to suit your organization’s environment, quality, and security needs. The AWS DevOps Guidance is designed for a wide range of professionals and organizations, including startups exploring DevOps for the first time, established enterprises refining their processes, public sector companies, cloud-native businesses, and customers migrating to the AWS Cloud. Whether you are steering strategic direction as a Chief Technology Officer (CTO) or Chief Information Security Officer (CISO), a developer or architect actively engaged in designing and deploying workloads, or in a compliance role overseeing quality assurance, auditing, or governance, this guidance is tailored to help you. Next Steps With the release of the AWS DevOps Guidance, we encourage you, our customers, to download and read the document, as well as implement and test your workloads in accordance with the recommendations within. Use the AWS DevOps Guidance in tandem with the AWS Well-Architected Framework to conduct an assessment of your organization and individual workload’s adherence to DevOps best practices to pinpoint areas of strength and opportunities for improvement. Collaborate with your teams – from developers to operations and decision-makers – to share insights from your assessment. Use the insights gained from the AWS DevOps Guidance to prioritize areas of improvement and iteratively improve your DevOps capabilities. Find the AWS DevOps Guidance on the AWS Well-Architected website or contact your AWS account team for more information. As with the AWS Well-Architected Framework and other industry and technology guidance, we recommend leveraging the AWS DevOps Guidance early and often – as you approach architectural and service design decisions, and whenever you carry out Well-Architected reviews. As you use the AWS DevOps Guidance, we would appreciate your comments and feedback to help us improve as best practices and technology evolve. We will continually refresh the content as we identify new best practices, metrics, and common scenarios. View the full article
  8. Today, we are excited to announce the general availability of HashiCorp Terraform 1.6, which is ready for download and immediately available for use in Terraform Cloud. This release adds numerous features to enhance developer productivity and practitioner workflows, including a powerful new testing framework, improvements to config-driven import, enhancements for Terraform Cloud CLI workflows, and more. Let’s take a look at what’s new... View the full article
  9. We are excited to announce the availability of improved AWS Well-Architected Framework guidance. In this update, we have made changes across all six pillars of the framework: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability. In this release, we have made the implementation guidance for the new and updated best practices more prescriptive, including enhanced recommendations and steps on reusable architecture patterns targeting specific business outcomes in the Amazon Web Services (AWS) Cloud... View the full article
  10. Tray.io's Universal Automation Cloud service leverages an existing Merlin AI engine to identify the most efficient way to automate a workflow. View the full article
  11. AWS Glue for Apache Spark now supports three open source data lake storage frameworks: Apache Hudi, Apache Iceberg, and Linux Foundation Delta Lake. These frameworks allow you to read and write data in Amazon Simple Storage Service (Amazon S3) in a transactionally consistent manner. AWS Glue is a serverless, scalable data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources. This feature removes the need to install a separate connector and reduces the configuration steps required to use these frameworks in AWS Glue for Apache Spark jobs. View the full article
  12. Codenotary this week announced it has integrated support for the Supply-Chain Levels for Software Artifacts (SLSA) framework in its free notarization and verification service for ensuring the integrity of code. Moshe Bar, Code Notary CEO, said as the first application security platform to attain SLSA compliance, the company is making it easier for organizations to […] View the full article
  13. AppSmith has added support for Git repositories to an open source framework for building custom applications using a low-code platform based on the JavaScript programming language. Rishabh Kaul, head of marketing for AppSmith, said the Git support will make it simpler for developers to manage version control as they iteratively build applications using a set […] The post AppSmith Adds Git Support to Low-Code App Dev Framework appeared first on DevOps.com. View the full article
  14. CloudWatch Synthetics now supports canary scripts in Python programming language with the Selenium open source web automation testing framework. This gives you more choice in the programming language and framework to use when creating canaries in CloudWatch Synthetics. View the full article
  15. The AWS Solutions team recently updated Machine to Cloud Connectivity Framework, a solution that provides secure factory equipment connectivity to the AWS Cloud. It features secure connectivity, fast and robust data ingestion and highly reliable and durable storage of factory equipment data. This solution is easy to deploy, can help reduce machine downtime and increase factory efficiency. View the full article
  16. RESTON, V.a. – Nov. 10, 2020 – SAFE Identity and its healthcare industry-led Policy Management Authority (PMA) have achieved a major milestone in their effort to enable a standards-based, interoperable Trust Framework for digital identities across all stakeholders in the highly distributed healthcare industry. Today, the healthcare industry consortium and certification body announced it has published the new SAFE Identity […] The post SAFE Identity Achieves Major Standards Milestone in Cross-Industry Effort to Complete Interoperable Trust Framework for Digital Identities for Healthcare appeared first on DevOps.com. View the full article
  17. The practice of data science requires the use of machine learning frameworks extensively. Now, this could be for many reasons but largely to automate their processes that drive their business forward. Framework focused solutions mean data scientists don’t always need to have extensive experience in coding and programming languages, and can instead use their expertise in solving bigger problems on their table. Reports show that 85% of data pros have used at least one ML framework... View the full article
  • Forum Statistics

    43.3k
    Total Topics
    42.7k
    Total Posts
×
×
  • Create New...