All Activity
- Last week
-
Announcing Public Preview of Streaming Table and Materialized View Sharing
We are thrilled to announce that the sharing of materialized views and streaming tables is now available in Public Preview. Streaming Tables (STs) continuously ingestView the full article
-
Amazon Bedrock Data Automation now supports modality controls, hyperlinks and larger documents
Amazon Bedrock Data Automation (BDA) now supports modality enablement, modality routing by file type, extraction of embedded hyperlinks when processing documents in Standard Output, and an increased overall document page limit of 3,000 pages. These new features give you more control over how your multimodal content is processed and improve BDA’s overall document extraction capabilities. With Modality Enablement and Routing, you can configure which modalities (Document, Image, Audio, Video) should be enabled for a given project and manually specify the modality routing for specific file types. JPEG/JPG and PNG files can be processed as either Images or Documents based on your specific use case requirements. Similarly, MP4/M4V and MOV files can be processed as either video files or audio files, allowing you to choose the optimal processing path for your content. Embedded Hyperlink Support enables BDA to detect and return embedded hyperlinks found in PDFs as part of the BDA standard output. This feature enhances the information extraction capabilities from documents, preserving valuable link references for applications such as knowledge bases, research tools, and content indexing systems. Lastly, BDA now supports processing documents up to 3,000 pages per document, doubling the previous limit of 1,500 pages. This increased limit allows you to process larger documents without splitting them, simplifying workflows for enterprises dealing with long documents or document packets. Amazon Bedrock Data Automation is generally available in the US West (Oregon) and US East (N. Virginia) AWS Regions. To learn more, visit the Bedrock Data Automation page or view documentation. View the full article
-
Amazon EventBridge cross-account event delivery now in the AWS GovCloud (US) Regions
Starting today, in the AWS GovCloud (US-East) and AWS GovCloud (US-West) Regions, you can now deliver events from an Amazon EventBridge Event Bus directly to AWS services in another account. Using multiple accounts can improve security and streamline business processes while reducing the overall cost and complexity of your architecture. Amazon EventBridge Event Bus is a serverless event broker that enables you to create scalable event-driven applications by routing events between your own applications, third-party SaaS applications, and other AWS services. This launch allows you to directly target services in another account, without the need for additional infrastructure such as an intermediary EventBridge Event Bus or Lambda function, simplifying your architecture and reducing cost. For example, you can now route events from your EventBridge Event Bus directly to a different team's SQS queue in a different account. The team receiving events does not need to learn about or maintain EventBridge resources and simply needs to grant IAM permissions to provide access to the queue. Events can be delivered cross-account to EventBridge targets that support resource-based IAM policies such as Amazon SQS, AWS Lambda, Amazon Kinesis Data Streams, Amazon SNS, and Amazon API Gateway. In addition to the AWS GovCloud (US) Regions, direct delivery to cross-account targets is available in all commercial AWS Regions. To learn more, please read our blog post or visit our documentation. Pricing information is available on the EventBridge pricing page. View the full article
-
AWS Resource Explorer now supports AWS PrivateLink
AWS Resource Explorer now supports AWS PrivateLink in all commercial AWS Regions, allowing you to search for and discover your AWS resources within your Amazon Virtual Private Cloud (VPC) without traversing the public internet. With AWS Resource Explorer you can search for and discover your AWS resources across AWS Regions and accounts in your organization, either using the AWS Resource Explorer console, the AWS Command Line Interface (AWS CLI), the AWS SDKs, or the unified search bar from wherever you are in the AWS Management Console. For more information about the AWS Regions where AWS Resource Explorer is available, see the AWS Region table. To turn on AWS Resource Explorer, visit the AWS Resource Explorer console. Read about getting started in our AWS Resource Explorer documentation, or explore the AWS Resource Explorer product page. View the full article
-
Amazon Q Developer operational investigations (preview) now available in additional regions
Starting today, Amazon Q Developer operational investigations is available in preview in 11 additional regions. With this launch, Amazon Q Developer operational investigations is now available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Europe (Stockholm), Europe (Spain), Asia Pacific (Tokyo), Asia Pacific (Hong Kong), Asia Pacific (Sydney), Asia Pacific (Singapore), and Asia Pacific (Mumbai). Amazon Q Developer helps you accelerate operational investigations across your AWS environment in just a fraction of the time. With a deep understanding of your AWS cloud environment and resources, Amazon Q Developer looks for anomalies in your environment, surfaces related signals for you to explore, identifies potential root-cause hypotheses, and suggests next steps to help you remediate issues faster. The new operational investigation capability within Amazon Q Developer is available at no additional cost during preview. To learn more, see getting started and best practices documentation. View the full article
-
AWS Resource Groups now supports 160 more resource types
Today, AWS Resource Groups is adding support for an additional 160 resource types for tag-based Resource Groups. Customers can now use Resource Groups to group and manage resources from services such as AWS Code Catalyst and AWS Chatbot. AWS Resource Groups enables you to model, manage and automate tasks on large numbers of AWS resources by using tags to logically group your resources. You can create logical collections of resources such as applications, projects, and cost centers, and manage them on dimensions such as cost, performance, and compliance in AWS services such as myApplications, AWS Systems Manager and Amazon CloudWatch. Resource Groups expanded resource type coverage is available in all AWS Regions, including the AWS GovCloud (US) Regions. You can access AWS Resource Groups through the AWS Management Console, the AWS SDK APIs, and the AWS CLI. For more information about grouping resources, see the AWS Resource Groups user guide and the list of supported resource types. To get started, visit AWS Resource Groups console. View the full article
-
Big Thinkers: Douglas Crockford – The Man Behind JSON and JSLint
Douglas Crockford is a renowned figure in the programming world, celebrated for his pivotal contributions to JavaScript and the creation of JSON (JavaScript Object Notation). While not traditionally categorized under cloud computing, his innovations have profoundly influenced web technologies that underpin modern cloud services. This profile delves into Crockford’s journey, highlighting his impact on the […] The article Big Thinkers: Douglas Crockford – The Man Behind JSON and JSLint was originally published on Build5Nines. To stay up-to-date, Subscribe to the Build5Nines Newsletter. View the full article
-
How Does AWS Handle Big Data for ML?
In this blog, we will discover the abilities of AWS to handle big data for ML. Maintaining high-quality data and records comes with a high level of challenges, but the AWS tools for processing and analysing big data come in handy. Let us explore how AWS achieves this! Major Challenges In Handling Big Data For Businesses Today, businesses are moving towards data-driven decision-making. From analysing customer interactions and transactions to IoT sensor data and Social media analysis, data is the starting point to extract meaningful insights. The main challenges faced in handling big data are Storing and retrieving data without performance issues that affect scalability. Raw data is formatted in the ML model during transformation. Training the ML Model involves a significant power source and optimisation. Protecting sensitive data while complying with regulatory requirements. Here is where machine learning makes its entry. Handling well-structured big data is the biggest struggle for any organisation. The absence of the right ML model infrastructure challenges will completely affect the efficiency and innovation of ML adaptation, making it difficult for businesses. AWS Solving Big Data Challenges For ML Amazon Web Services offers a fully managed, scalable, and cost-efficient cloud ecosystem for Organizations and businesses to securely take care of their storage and operational functions. The whole ecosystem is built to simplify my ML application handling data with its different tools like Amazon S3, AWS Glue, AWS EMR, Amazon SageMaker and others that streamline data storage, processes and training models. This allows businesses to gather and analyse insights effectively rather than managing the infrastructure over and over. Apart from different and dedicated AWS certifications, opting for the AWS Certified Machine Learning Associate certification – MLA-C01 helps learners master cloud tech and advance their career in cloud computing-based ML models. Eventually, there are definite opportunities to understand how to handle Big Data for Machine Learning. It validates the skills in implementing and managing machine learning workload, especially on AWS. From data preparation and feature engineering to model training and deployment, professionals excel in it. AWS Cloud Storage Tools For Big Data AWS Cloud Storage for Big Data offers an array of tools and services specifically designed to address storage, processing, and transformation, and it also contributes to machine learning workflows. Here is a detailed categorisation of AWS storage for Big data and its functionalities. For Data Storage * Amazon S3 (Simple Storage Service) The Amazon S3 is the backbone for data lakes, which offers virtually unlimited storage with scalability and high durability (99.99999999%). It can store structured, semi-structured, and unstructured data in its very own native format. The tool is an ideal choice for building centralized data lakes, and it stores raw and processed big data. It can accept multiple storage classes, contributing towards cost optimisation based on Access frequency. Lifecycle management policies transition data between storage tiers. And it can innately integrate with analytical tools like Amazon Athena and Redshift Spectrum for querying data from S3. * Amazon Redshift It’s a fully managed, scalable cloud data warehouse that is designed for running complex SQL queries on larger datasets. This is best suited for analytical workloads that require high performance and scalability. Redshift uses massively parallel processing to enable fast query execution. Its Columnar storage is efficient in compressing and retrieving. It is a perfect tool to integrate with Amazon SageMaker, building and training ML models directly from the warehouse. * AWS Lake Formation AWS Lake Formation is an easy-to-set-up service to secure data lakes as it collects and catalogues data from different sources into Amazon S3. It automates schema discovery and metadata management and centralises security policies to control access. This simplifies the creation of secure and scalable data lakes. Data Processing and Transformation * AWS Glue AWS Glue is a serverless ETL (Extract, Transform, Load) service that prepares and transforms data for analytics and machine learning. It can automate ETL workflow and prepare big data for analytics and ML pipelines. The Built-in Data catalogue manages metadata and its integration with Apache Spark distributes data processing. It also supports both batch and streaming ETL jobs. * Amazon EMR (Elastic MapReduce) This is a managed Hadoop Framework that allows large dataset processing in a distributed format using open-source tools like Spark, Hive and Presto. Has Dynamic scaling that cluster-based workload needs and is very seamless to integrate with S3 for data access. This is an ideal tool to run large-scale distributed computations. * AWS Data Pipeline This is a web service that automates the movement and transformation of data between AWS services and on-premises systems. The customizable workflow makes the retry mechanism easy, and it’s easy to integrate with Redshift and Dynamodb-like services. This can automate recurring data workflows like backups and transformations. Model Training and Deployment * Amazon SageMaker Complete service manager simplifying building, training, tuning and deploying machine learning models at scale. The Amazon SageMaker has a built-in algorithm to optimize big data training; it can manage Jupyter notebooks for experimentation and integrates S3, Glue and Redshift like Amazon services seamlessly. All of it makes sense to go to end-to-end ML workflow management. * AWS Lambda This is a serverless computer service provider that runs code in response to events without providing servers. AWS Lambda has an event-driven trigger for ML inferences with the potential to handle millions of requests per second. It supports real-time inference and processing in ML pipelines. * Amazon EC2 (Elastic Compute Cloud) The Amazon EC2 provides resizable compute capacity in the cloud to run custom ML models. It can train a wide range of instance types to be optimised for ML training, and it is flexible to install custom frameworks and libraries. It is mainly used in high-performing training jobs that require specialised hardware like GPU and others. Data Integration * Amazon Kinesis With the ability to stream services for high-velocity data, Amazon Kinesis handles use cases like log analysis, IoT telemetry, event tracking and more. It can also scale automatically with the ability to accommodate varying workloads. * AWS Data Migration Service AWS data migration services simplify database migration to AWS with minimal downtime. It also supports heterogeneous migration between different database engines conveniently. Vitalising these tools effectively, AWS tools form a complete, comprehensive ecosystem to handle big data, challenges, access storage, processes, transforms and machine learning. AWS Governance & Monitoring for Big Data AWS projects have shown considerable scaling and complexity, where governance and monitoring become highly essential to ensure optimised performance, cost control, and data security. Here are a few tools that help organisations monitor their infrastructure, manage resource usage, and maintain compliance conveniently. Amazon CloudWatch – For Metrics & Monitoring It provides real-time monitoring for AWS resources, applications, and services. Collecting and tracking metrics such as CPU usage, memory utilisation, and disk I/O from services like EC2, S3, SageMaker, and Lambda. AWS CloudTrail – For Governance & Auditing AWS CloudTrail provides visibility into all API calls made within the AWS account. It is a central audit trail for security and operations, reviewing each move. AWS Cost Explorer – For Cost & Usage Analysis AWS Cost Explorer is a budgeting and cost visualisation tool that makes businesses understand their AWS spend and optimise it effectively. Best Practices to Manage Big Data with MLA-C01 It’s truly a critical skill to manage big data in any sector and industry, which is validated by AWS Certified Machine Learning – Speciality (MLA-C01) certification. These best practices are to be understood and followed to optimise performance and cost, also ensure data security and scalability across ML workflows. Leverage on S3 storage class, which is standard, Intelligent-Tiering, Glacier, etc. – that optimises cost based on the patterns that the data is accessed. You can then configure the lifecycle policy, which is automated to transition older data into cost-effective tiers. You can also secure the data with IAM and Encryption by implementing IAM roles that have the least privileged access and use S3 bucket policies to fine-grain control. And for key management, enable server-side encryption and you can also consider AWS KMS. By storing data in columnar format like parquet or ORC, it’s convenient to reduce the storage size and improve the read performance. You also optimize the partitioning strategies for faster query performance in analytics and ML pipelines. With Amazon CloudWatch for storage metrics and CloudTrail tracks API usage and identity anomalies that access data and configuration, you can monitor and audit. The S3 versioning is used for backup and recovery and design workflow by assuming eventual consistency. By combining AWS Lambda or Step Function with S3 automate scalable data processing tasks. The core of the MLA-C01 certification aligns with these practices and by contributing to preparing data, implementing ML solutions and maintaining operational excellence in machine learning projects. To Sum Up In this blog, we saw the abilities of AWS to handle big data in real time for data processing, transforming, managing, storing, and analysing. This highly contributes towards big data machine learning model training. With its built-in exclusive ecosystem that enables businesses to handle big data. All the Business and cloud enthusiasts who are willing to explore big data with minimal ML training can start with the AWS Certified Machine Learning Associate Certification. We have dedicated practice tests for MLA-C01 and for further hands-on learning, check out our sandboxes and hand-on labs that are available. Get started with our practice test and level up your A game in training big data for ML in organisations. View the full article
-
Patient-Centric Care: Advancements in Digital Health Tools
The healthcare landscape is changing rapidly, emphasizing patient-centered care and personalized health experiences.View the full article
-
Gen AI-Powered Command Center
The Challenge: Fragmented Data and Delayed Decision-Making Energy companies grapple with a pervasive challenge: data silos. These isolated information systems fragment critical data across variousView the full article
-
Linux Tutorials: My Experience – Top Things to Do After Installing Ubuntu 25.04
The post My Experience – Top Things to Do After Installing Ubuntu 25.04 first appeared on Tecmint: Linux Howtos, Tutorials & Guides .So, you’ve just installed Ubuntu 25.04 “Plucky Puffin” on your computer—congrats! I recently did the same, and let me tell The post My Experience – Top Things to Do After Installing Ubuntu 25.04 first appeared on Tecmint: Linux Howtos, Tutorials & Guides.View the full article
-
Amazon Connect agent workspace expands capabilities for third-party applications, including contact-related actions
The Amazon Connect agent workspace now supports additional capabilities for third-party applications including the ability make outbound calls, accept, transfer, and clear contacts, and update agent status. These enhancements allow you to integrate applications that give agents more intuitive workflows. For example, agents can now initiate one-click outbound calls from a custom-built call history interface that presents their most recent customer interactions. Third-party applications are available in the following AWS Regions: US East (N. Virginia), US-West (Oregon), Africa (Cape Town), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), and Europe (London). To learn more and get started, see our admin guide and developer guide. View the full article
-
AWS AppSync Events now supports data source integrations for channel namespaces
Starting today, AWS AppSync Events, a fully managed service for serverless WebSocket APIs with full connection management, now supports data source integrations for channel namespaces. This new feature enables developers to associate AWS Lambda functions, Amazon DynamoDB tables, Amazon Aurora databases, and other data sources with channel namespace handlers to process published events and subscription requests. Developers can now connect directly to Lambda functions without writing code and leverage both request/response and event modes for synchronous and asynchronous operations. With these new capabilities, developers can create sophisticated event processing workflows by transforming and filtering published events using Lambda functions, or save batches of events to DynamoDB using the new AppSyncJS batch utilities for DynamoDB. This integration enables complex interactive flows, making it easier for developers to build rich, real-time applications with features like data validation, event transformation, and persistent storage of events. By simplifying the architecture of real-time applications, this enhancement significantly reduces development time and operational overhead for front-end web and mobile development. This feature is now available in all AWS Regions where AWS AppSync is offered, providing developers worldwide with access to these powerful new integration capabilities. Powertools for AWS Lambda new AppSync Events integration are also now available to easily write your Lambda functions. To learn more about AWS AppSync Events and channel namespace integrations, visit the launch blog post, the AWS AppSync documentation, and the Powertools for Lambda documentation (TypeScript, Python, .NET). You can get started with these new features through the AWS AppSync console. View the full article
-
Amazon VPC Reachability Analyzer and Amazon VPC Network Access Analyzer are now available in Europe (Spain) Region
With this launch, VPC Reachability Analyzer and VPC Network Access Analyzer are now available in Europe (Spain) Region. VPC Reachability Analyzer allows you to diagnose network reachability between a source resource and a destination resource in your virtual private clouds (VPCs) by analyzing your network configurations.For example, Reachability Analyzer can help you identify a missing route table entry in your VPC route table that could be blocking network reachability between an EC2 instance in Account A that is not able to connect to another EC2 instance in Account B in your AWS Organization. VPC Network Access Analyzer allows you to identify unintended network access to your resources on AWS. Using Network Access Analyzer, you can verify whether network access for your VPC resources meets your security and compliance guidelines. For example, you can create a scope to verify that the VPCs used by your Finance team are separate, distinct, and unreachable from the VPCs used by your Development team. For more information on features, visit documentation for VPC Reachability Analyzer and VPC Network Access Analyzer. For pricing details, refer to the Network Analysis tab on the Amazon VPC Pricing Page. View the full article
-
Amazon SageMaker Lakehouse now supports attribute based access control
Amazon SageMaker Lakehouse now supports attribute-based access control (ABAC), using AWS Identity and Access Management (IAM) principal and session tags to simplify data access, grant creation, and maintenance. With ABAC, you can manage permissions using dynamic business attributes associated with user identities. Previously, SageMaker Lakehouse granted access to lakehouse databases and tables by directly assigning permissions to specific principals such as IAM users and IAM roles, a process that could quickly become unwieldy as the number of users grew. ABAC now allows administrators to grant permissions on a resource with conditions that specify user attribute keys and values. This means that any IAM principal or IAM role with matching principal or session tag keys and values will automatically have access to the resource making the experience more efficient. You can use ABAC though the AWS Lake Formation console to provide access to IAM users and IAM roles for both in-account and cross-account scenarios. For instance, rather than creating individual policies for each developer, administrators can now simply assign them an IAM tag with a key such as “team” and value "developers" and provide access to all developers with a single permission grant. As new developers join with the matching tag and value, no additional policy modifications are required. This feature is available in all AWS Regions where SageMaker Lakehouse is available. To get started, read the launch blog and read ABAC documentation. View the full article
-
AWS AppConfig now supports Internet Protocol Version 6 (IPv6)
AWS AppConfig now supports dual-stack endpoints, facilitating connectivity through Internet Protocol Version 6. The existing AWS AppConfig endpoints supporting IPv4 will remain available for backwards compatibility. The continuous growth of the internet has created an urgent need for IPv6 adoption, as IPv4 address space reaches its limits. Through AWS AppConfig's implementation of dual-stack endpoints, organizations can execute a strategic transition to IPv6 architecture on their own timeline. This approach enables companies to satisfy IPv6 regulatory standards while preserving IPv4 connectivity for systems that have not yet moved to IPv6 capabilities. IPv6 support for AWS AppConfig resources is available in all AWS Regions, including the AWS GovCloud (US) Regions. To get started, use the AWS AppConfig Getting Started Guide, or read more at Understanding IPv6 support for AWS AppConfig. View the full article
-
The Importance of Routine for Children
Establishing a daily routine is essential for children's development and well-being. At Little Mowgli, we emphasize the significance of routines in providing children with a sense of security and stability. Consistent schedules help children understand what to expect, reducing anxiety and promoting a sense of control over their environment.Routines also play a vital role in teaching children important life skills, such as time management and responsibility. By involving children in creating their daily schedules, parents can encourage independence and decision-making. This not only helps children feel empowered but also fosters a sense of accomplishment as they complete tasks.We encourage families to create routines that include time for learning, play, and relaxation. At Little Mowgli, we provide tips and templates for establishing effective routines that cater to the unique needs of each child, ensuring a balanced and fulfilling daily life. Visit here: Little Mowgli Nursery in Leyland
-
Little Mowgli joined the community
-
Announcing: Heroic Labs Satori Integration with Databricks
Unleashing the Power of Predictive Analytics and LiveOps Satori and Databricks Integration In the dynamic world of game development, data is the ultimate power-up. AtView the full article
-
Full Stack Developer vs DevOps Engineer: What’s the Difference?
Choosing between a DevOps engineer & a full stack developer depends on your project needs. 1. Fullstack developers are ideal if you're develop a web or mobile app from beginning—they handle everything from the front-end to the back-end. 2. DevOps engineers, on the other hand, focus on automation, deployment, and maintaining scalable infrastructure. If your goal is to launch a complete product quickly, full stack development services are the better choice. But if you're optimizing or scaling an existing system, a DevOps expert is what you need.
-
2025 DLT Update: Intelligent, fully governed data pipelines
Over the past several months, we’ve made DLT pipelines faster, more intelligent, and easier to manage at scale. DLT now delivers a streamlined, high-performance foundationView the full article
-
Enterprise AI Solutions in Motion: Transforming How Business Thinks
Enterprise AI solutions fundamentally transform business operations by utilizing data to reveal insights and enhance decision-making processes.View the full article
-
Amazon Redshift adds history mode support to 8 third-party SaaS applications
Amazon Redshift now supports history mode for zero-ETL integrations with eight third-party applications including Salesforce, ServiceNow, and SAP. This addition complements existing history mode support for Amazon Aurora PostgreSQL-compatible and MySQL-compatible, DynamoDB, and RDS for MySQL databases. The expansion enables you to track historical data changes without Extract, Transform, and Load (ETL) processes, simplifying data management across AWS and third-party applications. History Mode for zero-ETL integrations with third-party applications lets customers easily run advanced analytics on historical data from their applications, build comprehensive lookback reports, and perform trend analysis and data auditing across multiple zero-ETL data sources. This feature preserves the complete history of data changes without maintaining duplicate copies across various external data sources, allowing organizations to meet data retention requirements while significantly reducing storage needs and operational costs. Available for both existing and new integrations, history mode offers enhanced flexibility by allowing selective enabling of historical tracking for specific tables within third-party application integrations, giving businesses precise control over their data analysis and storage strategies. To learn more about history mode for zero-ETL integrations in Amazon Redshift and how it can benefit your data analytics workflows, visit the history mode documentation. To learn more about the supported third-party applications, visit the AWS Glue documentation. To get started with zero-ETL integrations, visit the getting started guides for Amazon Redshift. View the full article
-
Prompt Optimization in Amazon Bedrock now generally available
In November 2024, we launched Prompt Optimization in Amazon Bedrock to accelerate prompt creation and engineering for foundation models (FMs). Today, we're announcing its general availability and pricing. Prompt engineering is the process of designing prompts to guide FMs to generate relevant responses. These prompts must be customized for each FM according to its best practices and guidelines, which is a time-consuming process that delays application development. With Prompt Optimization in Amazon Bedrock, you can now automatically rewrite prompts for better performance and more concise responses on Anthropic, Llama, Nova, DeepSeek, Mistral and Titan models. You can compare optimized prompts against original versions without deployment and save them in Amazon Bedrock Prompt Management for prompt lifecycle management. You can also use Prompt Optimization in Bedrock Playground, or directly via API. Prompt Optimization is now generally available in the following AWS Regions: US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Sydney), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), South America (São Paulo). To get started, see the following resources: Blog Amazon Bedrock Pricing Amazon Bedrock user guide Amazon Bedrock API reference View the full article
-
Announcing AWS DMS Serverless automatic storage scaling
AWS Database Migration Service Serverless (AWS DMS Serverless) now offers storage scaling. With this enhancement you never have to worry about exceeding the DMS Serverless 100GB default replication storage capacity limit when processing very large transaction volumes or using detailed logging. You can now use AWS DMS Serverless for replicating even the highest of transaction volumes since there is no longer any storage capacity limits. AWS DMS Severless will automatically increase the storage for your replications any time the existing capacity reaches it limits. To learn more, see the AWS DMS Serverless storage capacity documentation. For AWS DMS regional availability, please refer to the AWS Region Table. View the full article
-
Analyzing Java applications performance with async-profiler in Amazon EKS
This blog was authored by Sascha Möllering, Principal Specialist Solutions Architect Containers and Yuriy Bezsonov, Senior Partner Solutions Architect. Introduction Container startup performance presents a significant challenge for Java applications running on Kubernetes, particularly during scaling events and recovery scenarios. Although Java remains a top choice for both legacy modernization and new microservices development, especially with frameworks such as Spring Boot or Quarkus, containerized Java applications often face unique challenges around startup times and runtime behavior. Performance profiling in containerized Java applications has long presented significant challenges. Async-profiler, a lightweight sampling solution from the Amazon Corretto team, offers an interesting approach for Java workloads running on Amazon Elastic Kubernetes Service (Amazon EKS). Eliminating traditional Safepoint bias issues (more information about Java Safepoint can be found in this post) enables more accurate performance analysis. In this post we explore practical implementations for both on-demand and continuous profiling scenarios, using the Mountpoint for Amazon S3 Container Storage Interface (CSI) driver to efficiently manage profiling data in your Kubernetes environment. Solution overview To avoid depending upon the JVM reaching a safepoint before profiling, async-profiler features HotSpot-specific APIs to collect stack traces. It works with OpenJDK and other Java runtimes based on the HotSpot JVM. async-profiler can trace the following kinds of events: CPU cycles. Hardware and Software performance counters such as cache misses, branch misses, page faults, context switches, etc. Allocations in Java Heap. Contented lock attempts, such as both Java object monitors and ReentrantLocks. Wall-clock time (also called wall time), which is the time it takes to run a block of code. We use UnicornStore as an example of a Java application to be profiled, as shown in the following figure. The UnicornStore is a Spring Boot 3 Java Application that provides RESTful API. It stores data in a relational database running on Amazon Aurora Serverless with Amazon Relational Database Service (Amazon RDS) for Postgres engine and afterward publishes an event about performed actions to Amazon EventBridge . We use Amazon EKS Auto Mode to deploy a ready-to-use Kubernetes cluster. Figure 1: UnicornStore application architecture Prerequisites The solution is based on the infrastructure as a code (IaC) of “Java on AWS Immersion Day”, which streamlines the setup of the environment. You only need an AWS account and AWS CloudShell to bootstrap the environment. Walkthrough The following steps walk you through this solution. Setting up the environment You can use the following setup to create a solution infrastructure with Visual Studio Code for the Web and with all the necessary tools installed: Navigate to CloudShell in the AWS console. Deploy AWS CloudFormation. You can also deploy the template directly in the CloudFormation console using the file from the provided link. curl https://raw.githubusercontent.com/aws-samples/java-on-aws/main/infrastructure/cfn/unicornstore-stack.yaml > unicornstore-stack.yaml CFN_S3=cfn-$(uuidgen | tr -d - | tr '[:upper:]' '[:lower:]') aws s3 mb s3://$CFN_S3 aws cloudformation deploy --stack-name unicornstore-stack \ --template-file ./unicornstore-stack.yaml \ --s3-bucket $CFN_S3 \ --capabilities CAPABILITY_NAMED_IAM aws cloudformation describe-stacks --stack-name unicornstore-stack --query "Stacks[0].Outputs[?OutputKey=='IdeUrl'].OutputValue" --output text aws cloudformation describe-stacks --stack-name unicornstore-stack --query "Stacks[0].Outputs[?OutputKey=='IdePassword'].OutputValue" --output text Wait until the command finishes successfully. The deployment takes about 15-20 minutes. After successful creation of the CloudFormation stacks, you can access the VS Code instance using the IdeUrl and the IdePassword from the output of the preceding command. All following commands must be run in the Terminal window of the VS Code instance. Instrumenting a container image with a profiler In this post you use wall-clock profiling. For functions specifically, wall-clock time measures the total duration from when the function starts until it completes. This measurement encompasses all delays, such as time spent waiting for locks to release and threads to synchronize. When wall-clock time exceeds CPU time, it suggests your code is spending time in a waiting state. A significant gap between these times often points to potential resource bottlenecks in your application. Conversely, when CPU time closely matches wall-clock time, it indicates computationally heavy code, where the processor is actively working for most of the running period. These CPU-bound code segments that take considerable time to run may benefit from performance optimization efforts. Add profiler binaries to a container image. Use multi-stage build to build container images: cat <<'EOF' > ~/environment/unicorn-store-spring/Dockerfile FROM public.ecr.aws/docker/library/maven:3-amazoncorretto-21-al2023 AS builder RUN yum install -y wget tar gzip RUN cd /tmp && \ wget https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-x64.tar.gz && \ mkdir /async-profiler && \ tar -xvzf ./async-profiler-3.0-linux-x64.tar.gz -C /async-profiler --strip-components=1 COPY ./pom.xml ./pom.xml COPY src ./src/ RUN mvn clean package && mv target/store-spring-1.0.0-exec.jar store-spring.jar RUN rm -rf ~/.m2/repository FROM public.ecr.aws/docker/library/amazoncorretto:21-al2023 RUN yum install -y shadow-utils procps tar COPY --from=builder /async-profiler/ /async-profiler/ COPY --from=builder store-spring.jar store-spring.jar RUN groupadd --system spring -g 1000 RUN adduser spring -u 1000 -g 1000 ENV SPRING_THREADS_VIRTUAL_ENABLED=false USER 1000:1000 EXPOSE 8080 ENTRYPOINT ["java","-jar","-Dserver.port=8080","/store-spring.jar"] EOF Build and push a new container image to the Amazon Elastic Container Registry (Amazon ECR): ~/java-on-aws/infrastructure/scripts/deploy/containerize.sh Deploy the Java application to the EKS cluster: ~/java-on-aws/infrastructure/scripts/deploy/eks.sh kubectl get pods -n unicorn-store-spring POD_NAME=$(kubectl get pods -n unicorn-store-spring | grep Running | awk '{print $1}') echo $POD_NAME The deployment takes about 3-5 minutes. On-demand profiling Now you can start on-demand profiling and benchmark the Java application under load. 1. Start on-demand profiling in the container and get the status (line 1), create load with Artillery for one minute with 200 concurrent POST request of createUnicorn (line 3), create a folder for profiling results, and then stop the profiling when the benchmarking is finished (line 4): kubectl exec -it $POD_NAME -n unicorn-store-spring -- /bin/bash -c "/async-profiler/bin/asprof start -e wall jps && /async-profiler/bin/asprof status jps" SVC_URL=$(~/java-on-aws/infrastructure/scripts/test/getsvcurl.sh eks) && echo $SVC_URL ~/java-on-aws/infrastructure/scripts/test/benchmark.sh $SVC_URL 60 200 kubectl exec -it $POD_NAME -n unicorn-store-spring -- /bin/bash -c "mkdir -p /home/spring/profiling && /async-profiler/bin/asprof stop -f /home/spring/profiling/profile-%t.html jps" 2. The resulting file is stored in the container /home/spring/profiling/ folder. Copy the resulting file to the development instance. Save the Summary report output from benchmarking tool for further comparison. kubectl -n unicorn-store-spring cp $POD_NAME:/home/spring/profiling ~/environment/unicorn-store-spring You can download the resulting file to your computer using right-click on the file name. Choose Download ... and open the file in a browser. Figure 2: Download on-demand profiling results Analyzing the profiling results and performing optimization As the result you get a Flame Graph. 1. Choose the Search button in the top left corner and search for the UnicornController.createUnicorn method. A significant portion of the run time is waiting for a database connection. You can improve this by increasing the number of database connections in the Java application properties file, as shown in the following figure. Figure 3: Initial on-demand profiling results 2. Change the number of database connections from 1 to 10: sed -i 's/spring\.datasource\.hikari\.maximumPoolSize=[0-9]*/spring.datasource.hikari.maximumPoolSize=10/' \ ~/environment/unicorn-store-spring/src/main/resources/application.properties Assuming that you have addressed what causes waiting time in the requests by changing the datasource value to 10, you can benchmark it with profiling and compare the results. 3. Build and push a new version container image to the Amazon ECR container registry: ~/java-on-aws/infrastructure/scripts/deploy/containerize.sh 4. Redeploy the application to the EKS cluster: kubectl rollout restart deployment unicorn-store-spring -n unicorn-store-spring kubectl rollout status deployment unicorn-store-spring -n unicorn-store-spring sleep 15 kubectl get pods -n unicorn-store-spring POD_NAME=$(kubectl get pods -n unicorn-store-spring | grep Running | awk '{print $1}') echo $POD_NAME 5. Repeat the benchmark test with profiling using the same command or with the following script, and download the results: ~/java-on-aws/infrastructure/scripts/test/profiling.sh kubectl -n unicorn-store-spring cp $POD_NAME:/home/spring/profiling ~/environment/unicorn-store-spring 6. Open the downloaded file in a browser and search for UnicornController.createUnicornas shown in the following figure. Figure 4: Improved on-demand profiling results You can see that the waiting time for getting a connection is significantly decreased. The benchmarks for response time were improved, too, as shown in the following figure. Figure 5: On-demand results comparison Setting up the infrastructure to store results of continuous profiling On-demand profiling helps find problems with the application, but those issues can happen at any time. It is a great advantage to be able to go back in time and investigate the state of a Java application when the issue happened. To achieve that, you can setup continuous profiling, create trace files, and store them to Amazon S3. 1. Create an S3 bucket to store the results of continuous profiling: export S3PROFILING=profiling-data-$(uuidgen | tr -d - | tr '[:upper:]' '[:lower:]') aws s3 mb s3://$S3PROFILING 2. Create an AWS Identity and Access Management (IAM) policy to allow pods with the Java application to access the S3 bucket: cat <<EOF > service-account-s3-policy.json { "Version": "2012-10-17", "Statement": [ { "Action": "s3:*", "Effect": "Allow", "Resource": [ "arn:aws:s3:::$S3PROFILING", "arn:aws:s3:::$S3PROFILING/*" ] } ] } EOF aws iam create-policy --policy-name unicorn-eks-service-account-s3-policy --policy-document file://service-account-s3-policy.json rm service-account-s3-policy.json For the access to the S3 bucket, use the Mountpoint for Amazon S3 Container Storage Interface (CSI) driver, which is deployed as an Amazon EKS add-on. This driver allows the Java application running in the EKS cluster to put files to Amazon S3 through a file system interface. Built on Mountpoint for Amazon S3, the Mountpoint CSI driver presents an S3 bucket as a storage volume accessible by containers in a Kubernetes cluster. At the time of writing, there is no solution like the Amazon EBS CSI driver for Amazon Elastic Container Service (Amazon ECS). However, the documentation describes how a similar approach can be implemented for an ECS cluster with Amazon Elastic Compute Cloud (Amazon EC2) instances. 3. Associate IAM OIDC provider with the EKS cluster and create an IAM role for the add-on: eksctl utils associate-iam-oidc-provider --region=$AWS_REGION \ --cluster=unicorn-store --approve eksctl create iamserviceaccount --cluster unicorn-store \ --name s3-csi-driver-sa --namespace kube-system \ --attach-policy-arn=$(aws iam list-policies --query 'Policies[?PolicyName==`unicorn-eks-service-account-s3-policy`].Arn' --output text) \ --approve --region=$AWS_REGION \ --role-name unicorn-eks-s3-csi-driver-role --role-only We use IAM roles for service accounts (IRSA) and not EKS Pod Identities due to the current limitations of Mountpoint for Amazon S3 CSI driver. Refer to the official installation guide for the updated installation procedure. 4. Install the Mountpoint for Amazon S3 CSI driver as the add-on to the EKS cluster: eksctl create addon --name aws-mountpoint-s3-csi-driver --cluster unicorn-store \ --service-account-role-arn arn:aws:iam::$ACCOUNT_ID:role/unicorn-eks-s3-csi-driver-role --force 5. Create PersistentVolume and PersistentVolumeClaim to access the S3 bucket from pods: ~/java-on-aws/infrastructure/scripts/deploy/s3pv.sh 6. Deploy the manifests for persistent objects to the EKS cluster: kubectl apply -f ~/environment/unicorn-store-spring/k8s/persistence.yaml Instrumenting a container image and deployment for continuous profiling 1. Override a command and arguments from the Dockerfile in the deployment and start profiling using the launching as agent instruction. With this approach you can change profiling parameters without a need to rebuild a container image. async-profiler starts with the Java application and creates call stacks each minute. Add commands to deployment.yaml: code ~/environment/unicorn-store-spring/k8s/deployment.yaml ~/environment/unicorn-store-spring/k8s/deployment.yaml apiVersion: apps/v1 … containers: - name: unicorn-store-spring command: ["/bin/sh", "-c"] args: - mkdir -p /profiling/$HOSTNAME && cd /profiling/$HOSTNAME; java -agentpath:/async-profiler/lib/libasyncProfiler.so=start,event=wall,file=./profile-%t.txt,loop=1m,collapsed -jar -Dserver.port=8080 /store-spring.jar; … securityContext: runAsNonRoot: true allowPrivilegeEscalation: false volumeMounts: - name: persistent-storage mountPath: /profiling volumes: - name: persistent-storage persistentVolumeClaim: claimName: s3-profiling-pvc EOF 2. Deploy the manifests to the EKS cluster and restart the deployment: kubectl apply -f ~/environment/unicorn-store-spring/k8s/deployment.yaml kubectl rollout status deployment unicorn-store-spring -n unicorn-store-spring sleep 15 kubectl get pods -n unicorn-store-spring 3. Check the state of the Java application pod and profiler: POD_NAME=$(kubectl get pods -n unicorn-store-spring | grep Running | awk '{print $1}') echo $POD_NAME kubectl logs $POD_NAME -n unicorn-store-spring | grep "Profiling started" 4. The output of the command should be similar to the following output: unicorn-store-spring-866847c8d8-rx82w Profiling started Analyzing the results continuous profiling 1. Create a load for five minutes: SVC_URL=$(~/java-on-aws/infrastructure/scripts/test/getsvcurl.sh eks) echo $SVC_URL ~/java-on-aws/infrastructure/scripts/test/benchmark.sh $SVC_URL 300 200 Each minute the profiler creates profile-YYYYMMDD-HHMISS.txt in the S3 bucket path corresponding to a pod name, as shown in the following figure. Figure 6: Continuous profiling results on Amazon S3 You can convert any of those files to Flame Graph using converter.jar from async-provider. 2. Create a folder for profiling stacks and copy the files from the S3 bucket: POD_NAME=$(kubectl get pods -n unicorn-store-spring | grep Running | awk '{print $1}') mkdir -p ~/environment/unicorn-store-spring/stacks/$POD_NAME aws s3 cp s3://$S3PROFILING/$POD_NAME ~/environment/unicorn-store-spring/stacks/$POD_NAME/ --recursive 3. Download async-provider to the development instance cd ~/environment/unicorn-store-spring wget https://github.com/async-profiler/async-profiler/releases/download/v3.0/async-profiler-3.0-linux-x64.tar.gz mkdir ~/environment/unicorn-store-spring/async-profiler tar -xvzf ./async-profiler-3.0-linux-x64.tar.gz -C ~/environment/unicorn-store-spring/async-profiler --strip-components=1 rm ./async-profiler-3.0-linux-x64.tar.gz 4. Choose one of the files and convert to FlameGraph, for example, the first file: cd ~/environment/unicorn-store-spring STACK_FILE=$(find ~/environment/unicorn-store-spring/stacks/$POD_NAME -type f -printf '%T+ %p\n' | sort | head -n 1| cut -d' ' -f2-) java -cp ./async-profiler/lib/converter.jar FlameGraph $STACK_FILE ./profile.html 5. Download profile.html to your computer, open it with a browser and Search for UnicornController.createUnicorn, as shown in the following figure. Figure 7: Continuous profiling graph This approach allows you to analyze the state of a Java application during specific period in time, such as the startup phase. Cleaning up 1. To avoid incurring future charges, delete deployed AWS resources with the commands in the VS Code terminal: ~/java-on-aws/infrastructure/scripts/cleanup/eks.sh aws cloudformation delete-stack --stack-name eksctl-unicorn-store-addon-iamserviceaccount-kube-system-s3-csi-driver-sa aws s3 rm s3://$S3PROFILING --recursive aws s3 rb s3://$S3PROFILING 2. Close the tab with VS Code, open CloudShell and run the commands to finish cleaning up: aws cloudformation delete-stack --stack-name unicornstore-stack The deletion of the stack can take about 20 minutes. Delete the S3 bucket that you used to deploy AWS CloudFormation template. Check the remained resources and stacks and delete them manually if necessary. Conclusion In this post we demonstrated the use of async-profiler with Amazon EKS either on-demand or in a continuous profiling mode. We initially set up the infrastructure with an EKS cluster and instrumented the UnicornStore Java application container image with async-profiler. We have built and uploaded the container image to Amazon ECR, deployed it to the EKS cluster, and run on-demand profiling under the load. Moreover, we created an Amazon S3 bucket and connected it to persistent volumes in the EKS cluster using Mountpoints for Amazon S3 with the corresponding CSI driver. After successful deployment of a pod based on the created container image, the results of the continuous profiling of the Java application were stored in an S3 bucket. With the help of async-profiler we found a bottleneck in the Java application and eliminated it. We also created a solution that helps to continuously create profiling data for the further analysis. If you want to dive deeper in the internals of profiling with async-profiler, then we recommend this three hour playlist to learn about all of the features. We hope we have given you some ideas on how you can profile your existing Java application using async-profiler. Feel free to submit enhancements to the sample application in the source repository. View the full article