Jump to content

Reimagine Batch and Streaming Data Pipelines with Dynamic Tables, Now Generally Available


Recommended Posts

Since Snowflake’s Dynamic Tables went into preview, we have worked with hundreds of customers to understand the challenges they faced producing high-quality data quickly and at scale. The No. 1 pain point: Data pipelines are becoming increasingly complex. 

This rising complexity is a result of myriad factors. As customers aim to support their growing data needs, they’re orchestrating pipelines across batch and streaming data, managing scalable infrastructure across various vendors, and developing more advanced transformation logic across many more data sets. 

Dynamic Tables: One of Snowflake’s Fastest-Adopted Features

At Snowflake, we combat this complexity with a clear principle for engineering data pipelines: Keep it simple. Dynamic Tables are the easiest way to build data pipelines that continuously process batch and streaming data across multiple stages of transformation. Built on Snowflake’s secure, scalable Data Cloud, Dynamic Tables require no external orchestration to manage, while providing easy, declarative SQL semantics to simplify data engineering for a broad spectrum of use cases.

And customers have taken note. Dynamic Tables is one of the fastest-adopted features in Snowflake history — with thousands of daily active customers participating in the preview, trying them for their use cases and helping us shape the product. 

Boston Children’s Hospital is one of the customers building modern streaming architectures on Snowflake. By leveraging Dynamic Tables for automated orchestration, it has reduced the number of ETL tools in use while improving everything from incremental processing, dependency management and data type management to logic simplification, error handling and latency reduction. 

Such strong preview-stage adoption has also allowed us to refine Dynamic Tables at a much faster rate. We of course stress-tested our infrastructure. But, more importantly, we gained critical insights into any critical gaps, helping us gain experience running a variety of use cases for both declarative and incremental processing pipelines — and enabling us to have already brought more than a hundred use cases into production. 

Given these learnings, we are extremely excited to announce that Dynamic Tables are now generally available globally on Amazon AWS, Microsoft Azure and Google Cloud. 

What Are Dynamic Tables? 

Helping organizations accelerate data engineering to deliver curated data for analytics, AI and applications, Dynamic Tables are a new table type that you can use at every stage of your processing pipeline. Whether you’re processing batch data that needs to be refreshed daily or near real-time data that needs to be processed in minutes, Dynamic Tables allow you to create data pipelines that are easy to build, operate and evolve.

A few reasons why customers love Dynamic Tables: 

Declarative pipeline: Program your pipeline declaratively by only expressing the transform logic as the expected outcome. Having no need to worry about the steps to get there can significantly reduce complexity in data pipelines.  

1. Transparent orchestration: Create pipelines of various shapes, from linear chains to directed graphs, by chaining Dynamic Tables together. Snowflake manages the orchestration and scheduling of pipeline refreshes based on your data freshness target for the whole pipeline. This is one of the most loved features of Dynamic Tables and significantly simplifies pipeline development.

2. Performance boost with incremental processing: For favorable workloads that are suited for incremental processing, Dynamic Tables can deliver a 10x performance improvement over an equivalent full refresh (based upon internal testing). 

3. One switch from batch to streaming: With a single parameter that can be changed with an ALTER command, you can control how often data is refreshed in your pipeline, which helps balance cost and data freshness.

4. Language choice: Dynamic Tables have broad support for SQL and growing Python support, so you can use your language of choice.

5. Operationalization: Dynamic Tables are fully observable and easy to operate directly via Snowsight observability, which provides programmatic access to build your own observability apps.

Fig-1.pngFigure 1: Observe chain of Dynamic Tables in directed acyclic graph (DAG) view directly in Snowsight. 

What’s New in General Availability?

If you’re no stranger to Dynamic Tables, general availability includes a variety of new features that will make your experience even richer: 

1. Sharing and collaboration: Dynamic Tables can now be shared across regions and clouds using Snowflake’s sharing and collaboration features. By sharing Dynamic Tables, you can easily share prepared data sets or data products with consumers in your organization, a partner organization or the broader data cloud community. This provides a seamless way to share cleaned, enriched and transformed data sets that keep themselves up to date at a cadence you specify. 

2. Disaster recovery and replication: Dynamic Tables support high availability through Snowflake replication infrastructure. You can build your production pipelines in peace knowing that you are supported with Snowflake’s disaster recovery solutions. 

Observability: Making Dynamic Table pipelines easy to operate was one of our overarching goals — and we are well on our way. We have added a ton of new functionality that makes Dynamic Tables more observable both via Snowsight and programmatic interfaces. In Snowsight, we added new account-level views, visibility into warehouse consumption, and the ability to suspend and resume refreshes. We also improved graph and refresh history. In our observability functions, we added new account usage views, extended retention of information schema functions and added vsupport for consistent metadata across Snowflake observability interfaces.

Screenshot-2024-04-11-at-4.24.51%E2%80%A Screenshot-2024-04-11-at-4.25.49%E2%80%A Screenshot-2024-04-11-at-4.26.28%E2%80%AFigure 2: Examples of improved observability of Dynamic Tables graph and refresh history, auto-suspend and auto-resume in Snowsight 

4. Data Cloud integrations: We added support for clustering, transient dynamic tables and governance policies (on sources of Dynamic Tables and Dynamic Tables themselves), so you can benefit from the best that the Snowflake Data Cloud has to offer. 

5. Scalability: You can now create 4x more Dynamic Tables in your account, and 10x more Dynamic Table sources feeding into another Dynamic Table. There are no longer any limits on the depth of a directed acyclic graph (DAG) that you can create. 

6. Query evolution and SELECT * support: When you add new columns to base tables, Dynamic Tables will now automatically evolve to absorb new columns and incrementally repair without needing to rebuild the Dynamic Table.

7. New documentation: We’ve added sought-after new articles to our documentation, including development best practices and guides for maximizing performance and troubleshooting pipeline issues, among various other improvements. 

And that’s just the beginning. We’ve also been working diligently to make many under-the-hood refinements to upgrade refresh performance, as well as systems stability and scalability. While these improvements do not change product interactions, you can expect to see their positive results as you transition Dynamic Table pipelines from development to production. 

We have ambitious performance goals to meet all customer use cases where they are. As we progress toward this goal, we want to make it easy for customers to understand how to derive the best Dynamic Tables performance and benefit from an efficient refresh pipeline strategy — especially when data patterns and query construction are not well suited for incremental processing. When in doubt, check the SHOW DYNAMIC TABLES command to see the refresh mode for your Dynamic Table, and the reason it was chosen. To optimize your pipeline performance, use this performance guide for a peek under the hood. 

Since the Dynamic Tables preview, partners across the data ecosystem have tested and integrated with Dynamic Tables to help their customers get better, faster refreshed results across a variety of use cases. The list of partners includes: Amazon QuickSight, AtScale, Astrato, Census Data, Coalesce.io, dbt, Domo, Hightouch, Looker (Google), Microsoft PowerBI, MicroStrategy, Qlik, Sigma, Streamkap, Tableau (Salesforce), TDAA, and ThoughtSpot

To get started with Dynamic Tables, watch this video about building pipelines with Dynamic Tables or dive into this quickstart guide.

The post Reimagine Batch and Streaming Data Pipelines with Dynamic Tables, Now Generally Available appeared first on Snowflake.

View the full article

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...