Search the Community
Showing results for tags 'streaming'.
-
Today’s fast-paced world demands timely insights and decisions, which is driving the importance of streaming data. Streaming data refers to data that is continuously generated from a variety of sources. The sources of this data, such as clickstream events, change data capture (CDC), application and service logs, and Internet of Things (IoT) data streams are proliferating. Snowflake offers two options to bring streaming data into its platform: Snowpipe and Snowflake Snowpipe Streaming. Snowpipe is suitable for file ingestion (batching) use cases, such as loading large files from Amazon Simple Storage Service (Amazon S3) to Snowflake. Snowpipe Streaming, a newer feature released in March 2023, is suitable for rowset ingestion (streaming) use cases, such as loading a continuous stream of data from Amazon Kinesis Data Streams or Amazon Managed Streaming for Apache Kafka (Amazon MSK). Before Snowpipe Streaming, AWS customers used Snowpipe for both use cases: file ingestion and rowset ingestion. First, you ingested streaming data to Kinesis Data Streams or Amazon MSK, then used Amazon Data Firehose to aggregate and write streams to Amazon S3, followed by using Snowpipe to load the data into Snowflake. However, this multi-step process can result in delays of up to an hour before data is available for analysis in Snowflake. Moreover, it’s expensive, especially when you have small files that Snowpipe has to upload to the Snowflake customer cluster. To solve this issue, Amazon Data Firehose now integrates with Snowpipe Streaming, enabling you to capture, transform, and deliver data streams from Kinesis Data Streams, Amazon MSK, and Firehose Direct PUT to Snowflake in seconds at a low cost. With a few clicks on the Amazon Data Firehose console, you can set up a Firehose stream to deliver data to Snowflake. There are no commitments or upfront investments to use Amazon Data Firehose, and you only pay for the amount of data streamed. Some key features of Amazon Data Firehose include: Fully managed serverless service – You don’t need to manage resources, and Amazon Data Firehose automatically scales to match the throughput of your data source without ongoing administration. Straightforward to use with no code – You don’t need to write applications. Real-time data delivery – You can get data to your destinations quickly and efficiently in seconds. Integration with over 20 AWS services – Seamless integration is available for many AWS services, such as Kinesis Data Streams, Amazon MSK, Amazon VPC Flow Logs, AWS WAF logs, Amazon CloudWatch Logs, Amazon EventBridge, AWS IoT Core, and more. Pay-as-you-go model – You only pay for the data volume that Amazon Data Firehose processes. Connectivity – Amazon Data Firehose can connect to public or private subnets in your VPC. This post explains how you can bring streaming data from AWS into Snowflake within seconds to perform advanced analytics. We explore common architectures and illustrate how to set up a low-code, serverless, cost-effective solution for low-latency data streaming. Overview of solution The following are the steps to implement the solution to stream data from AWS to Snowflake: Create a Snowflake database, schema, and table. Create a Kinesis data stream. Create a Firehose delivery stream with Kinesis Data Streams as the source and Snowflake as its destination using a secure private link. To test the setup, generate sample stream data from the Amazon Kinesis Data Generator (KDG) with the Firehose delivery stream as the destination. Query the Snowflake table to validate the data loaded into Snowflake. The solution is depicted in the following architecture diagram. Prerequisites You should have the following prerequisites: An AWS account and access to the following AWS services: AWS Identity and Access Management (IAM) Kinesis Data Streams Amazon S3 Amazon Data Firehose Familiarity with the AWS Management Console. A Snowflake account. A key pair generated and your user configured to connect securely to Snowflake. For instructions, refer to the following: Generate the private key Generate a public key Store the private and public keys securely Assign the public key to a Snowflake user Verify the user’s public key fingerprint An S3 bucket for error logging. The KDG set up. For instructions, refer to Test Your Streaming Data Solution with the New Amazon Kinesis Data Generator. Create a Snowflake database, schema, and table Complete the following steps to set up your data in Snowflake: Log in to your Snowflake account and create the database: create database adf_snf; Create a schema in the new database: create schema adf_snf.kds_blog; Create a table in the new schema: create or replace table iot_sensors (sensorId number, sensorType varchar, internetIP varchar, connectionTime timestamp_ntz, currentTemperature number ); Create a Kinesis data stream Complete the following steps to create your data stream: On the Kinesis Data Streams console, choose Data streams in the navigation pane. Choose Create data stream. For Data stream name, enter a name (for example, KDS-Demo-Stream). Leave the remaining settings as default. Choose Create data stream. Create a Firehose delivery stream Complete the following steps to create a Firehose delivery stream with Kinesis Data Streams as the source and Snowflake as its destination: On the Amazon Data Firehose console, choose Create Firehose stream. For Source, choose Amazon Kinesis Data Streams. For Destination, choose Snowflake. For Kinesis data stream, browse to the data stream you created earlier. For Firehose stream name, leave the default generated name or enter a name of your preference. Under Connection settings, provide the following information to connect Amazon Data Firehose to Snowflake: For Snowflake account URL, enter your Snowflake account URL. For User, enter the user name generated in the prerequisites. For Private key, enter the private key generated in the prerequisites. Make sure the private key is in PKCS8 format. Do not include the PEM header-BEGIN prefix and footer-END suffix as part of the private key. If the key is split across multiple lines, remove the line breaks. For Role, select Use custom Snowflake role and enter the IAM role that has access to write to the database table. You can connect to Snowflake using public or private connectivity. If you don’t provide a VPC endpoint, the default connectivity mode is public. To allow list Firehose IPs in your Snowflake network policy, refer to Choose Snowflake for Your Destination. If you’re using a private link URL, provide the VPCE ID using SYSTEM$GET_PRIVATELINK_CONFIG: select SYSTEM$GET_PRIVATELINK_CONFIG(); This function returns a JSON representation of the Snowflake account information necessary to facilitate the self-service configuration of private connectivity to the Snowflake service, as shown in the following screenshot. For this post, we’re using a private link, so for VPCE ID, enter the VPCE ID. Under Database configuration settings, enter your Snowflake database, schema, and table names. In the Backup settings section, for S3 backup bucket, enter the bucket you created as part of the prerequisites. Choose Create Firehose stream. Alternatively, you can use an AWS CloudFormation template to create the Firehose delivery stream with Snowflake as the destination rather than using the Amazon Data Firehose console. To use the CloudFormation stack, choose Generate sample stream data Generate sample stream data from the KDG with the Kinesis data stream you created: { "sensorId": {{random.number(999999999)}}, "sensorType": "{{random.arrayElement( ["Thermostat","SmartWaterHeater","HVACTemperatureSensor","WaterPurifier"] )}}", "internetIP": "{{internet.ip}}", "connectionTime": "{{date.now("YYYY-MM-DDTHH:m:ss")}}", "currentTemperature": {{random.number({"min":10,"max":150})}} } Query the Snowflake table Query the Snowflake table: select * from adf_snf.kds_blog.iot_sensors; You can confirm that the data generated by the KDG that was sent to Kinesis Data Streams is loaded into the Snowflake table through Amazon Data Firehose. Troubleshooting If data is not loaded into Kinesis Data Steams after the KDG sends data to the Firehose delivery stream, refresh and make sure you are logged in to the KDG. If you made any changes to the Snowflake destination table definition, recreate the Firehose delivery stream. Clean up To avoid incurring future charges, delete the resources you created as part of this exercise if you are not planning to use them further. Conclusion Amazon Data Firehose provides a straightforward way to deliver data to Snowpipe Streaming, enabling you to save costs and reduce latency to seconds. To try Amazon Kinesis Firehose with Snowflake, refer to the Amazon Data Firehose with Snowflake as destination lab. About the Authors Swapna Bandla is a Senior Solutions Architect in the AWS Analytics Specialist SA Team. Swapna has a passion towards understanding customers data and analytics needs and empowering them to develop cloud-based well-architected solutions. Outside of work, she enjoys spending time with her family. Mostafa Mansour is a Principal Product Manager – Tech at Amazon Web Services where he works on Amazon Kinesis Data Firehose. He specializes in developing intuitive product experiences that solve complex challenges for customers at scale. When he’s not hard at work on Amazon Kinesis Data Firehose, you’ll likely find Mostafa on the squash court, where he loves to take on challengers and perfect his dropshots. Bosco Albuquerque is a Sr. Partner Solutions Architect at AWS and has over 20 years of experience working with database and analytics products from enterprise database vendors and cloud providers. He has helped technology companies design and implement data analytics solutions and products. View the full article
-
- data architecture
- real-time
-
(and 2 more)
Tagged with:
-
Starting today, you can use the Secure Reliable Transport (SRT) protocol to broadcast to your Amazon Interactive Video Service (Amazon IVS) channels. This new protocol, in addition to RTMPS, expands options for live streaming and helps to maintain video quality when sent across varying network conditions. View the full article
-
Streaming data pipelines have become an essential component in modern data-driven organizations. These pipelines enable real-time data ingestion, processing, transformation, and analysis. In this article, we will delve into the architecture and essential details of building a streaming data pipeline. Data Ingestion Data ingestion is the first stage of streaming a data pipeline. It involves capturing data from various sources such as Kafka, MQTT, log files, or APIs. Common techniques for data ingestion include: View the full article
-
- data streaming
- streaming
-
(and 1 more)
Tagged with:
-
From customer interactions on e-commerce platforms to social media trends and from sensor data in internet of things (IoT) devices to financial market updates, streaming data encompasses a vast array of information. This ability to handle real-time flow often distinguishes successful organizations from their competitors. Harnessing the potential of streaming data processing offers organizations an opportunity to stay at the forefront of their industries, make data-informed decisions with unprecedented agility, and gain invaluable insights into customer behavior and operational efficiency. AWS provides a foundation for building robust and reliable data pipelines that efficiently transport streaming data, eliminating the intricacies of infrastructure management. This shift empowers engineers to focus their talents and energies on creating business value, rather than consuming their time for managing infrastructure... View the full article
-
- architecture
- streaming
-
(and 2 more)
Tagged with:
-
AWS Glue streaming ETL (Extract Transform and Load) can now detect compressed data streaming from Amazon Kinesis, Amazon Managed Streaming for Apache Kafka (Amazon MSK), and self managed Apache Kafka. It can then automatically decompresses this data without customers having to write code, saving them development hours. AWS Glue Streaming ETL jobs continuously consume data from streaming sources, cleans and transforms the data in-flight, and makes it available for analysis in seconds. Customers compress data prior to streaming in-order to improve performance and to avoid throttling limits by Amazon Kinesis and Amazon MSK. Prior to this feature, customers had to write user defined functions to uncompress data from a stream, which is time consuming. View the full article
-
We are excited to announce the availability of server-side HTTP streaming for your serverless applications running on Cloud Run (fully managed). With this enhanced networking capability, your Cloud Run services can serve larger responses or stream partial responses to clients during the span of a single request, enabling quicker server response times for your applications... Read Article
- 4 replies
-
- gcp
- serverless
-
(and 2 more)
Tagged with:
-
The Amazon WorkSpaces Streaming Protocol (WSP) is now generally available. WSP is a cloud-native streaming protocol that enables a consistent user experience when accessing your WorkSpaces across global distances and unreliable networks. View the full article
- 1 reply
-
- aws
- workspaces
-
(and 1 more)
Tagged with:
-
Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for you to add speech-to-text capabilities to your applications. Today, we are excited to announce native support for Ogg opus and FLAC encoded audio in Amazon Transcribe for streaming transcription. Previously, you were required to transcode audio streams with these encodings to PCM encoding which added extra costs and scaling challenges for large workloads. View the full article
- 1 reply
-
- aws
- transcribe
-
(and 3 more)
Tagged with:
-
Amazon Transcribe Medical is a HIPAA-eligible automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capabilities to their healthcare and life science applications. Up until now, Amazon Transcribe Medical can transcribe speech for medical specialties under the broader Primary Care umbrella such as internal medicine, family medicine, obstetrics-gynecology (OB-GYN), and pediatrics. Starting today, the service expands streaming transcription support for five new medical specialties covering cardiology, oncology, neurology, radiology, and urology. View the full article
- 1 reply
-
- aws
- transcribe
-
(and 2 more)
Tagged with:
-
Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for you to add speech-to-text capabilities to your applications. Today, we are excited to launch Brazilian Portuguese, Japanese and Korean language support for Transcribe streaming. To deliver streaming transcriptions with low latency for these languages, we are also announcing availability of Amazon Transcribe streaming in the South America (Sao Paulo), Asia Pacific (Tokyo) and Asia Pacific (Seoul) regions. Now you can transcribe live media content in these languages with ease. View the full article
- 1 reply
-
- aws
- transcribe
-
(and 1 more)
Tagged with:
-
Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for you to add speech-to-text capabilities to your applications. Today, we are excited to announce German and Italian language support for streaming audio. We are also announcing availability of Amazon Transcribe streaming in the EU (London) and EU (Frankfurt) regions. These new languages and regions expand the markets served by Amazon Transcribe streaming and enable customers to reach a broader global audience. View the full article
- 1 reply
-
- aws
- transcribe
-
(and 1 more)
Tagged with:
-
Amazon Managed Streaming for Apache Kafka (Amazon MSK) now supports Apache Kafka version 2.6.0 for new and existing clusters. Apache Kafka 2.6.0 includes several bug fixes and new features that improve performance. Some key features include native APIs to manage client quotas (KIP-546) and explicit rebalance triggering to enable advanced consumer usecases (KIP-568). For a complete list of improvements and bug fixes, see the Apache Kafka release notes for 2.6.0. View the full article
-
Streaming extract, transform, and load (ETL) jobs in AWS Glue can now read data encoded in the Apache Avro format. Previously, streaming ETL jobs could read data in the JSON, CSV, Parquet, and XML formats. With the addition of Avro, streaming ETL jobs now support all the same formats as batch AWS Glue jobs. View the full article
-
Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for you to add speech-to-text capabilities to your applications. Today, we are excited to announce native support for Ogg opus and FLAC encoded audio in Amazon Transcribe for streaming transcription. Previously, you were required to transcode audio streams with these encodings to PCM encoding which added extra costs and scaling challenges for large workloads. View the full article
- 1 reply
-
- aws
- transcribe
-
(and 3 more)
Tagged with:
-
Forum Statistics
70.4k
Total Topics68.3k
Total Posts