Jump to content

Search the Community

Showing results for tags 'snowflake'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • General
    • General Discussion
    • Artificial Intelligence
    • DevOps Forum News
  • DevOps & SRE
    • DevOps & SRE General Discussion
    • Databases, Data Engineering & Data Science
    • Development & Programming
    • CI/CD, GitOps, Orchestration & Scheduling
    • Docker, Containers, Microservices, Serverless & Virtualization
    • Infrastructure-as-Code
    • Kubernetes
    • Linux
    • Logging, Monitoring & Observability
    • Red Hat OpenShift
    • Security
  • Cloud Providers
    • Amazon Web Services
    • Google Cloud Platform
    • Microsoft Azure

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Cloud Platforms


Cloud Experience


Development Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

  1. Legacy security information and event management (SIEM) solutions, like Splunk, are powerful tools for managing and analyzing machine-generated data. They have become indispensable for organizations worldwide, particularly for security teams. But as much as security operation center (SOC) analysts have come to rely on solutions like Splunk, there is one complaint that comes up for some: Costs can quickly add up. The issue centers around their volume-based pricing model. This model can force security teams to make difficult decisions on what data to ingest. There are a number of online threads — see here, here and here just to link to a few — dedicated to how best to control costs, while limiting how much an organization has to compromise its security. But what if security teams didn’t have to make tradeoffs? This blog post explores how Snowflake can help with this challenge. Let’s start with five cost factors organizations need to consider with their legacy SIEM solution and how Snowflake can help. Legacy SIEM cost factors to keep in mind Data ingestion: Traditional SIEMs often impose limits to data ingestion and data retention. Snowflake allows security teams to store all their data in a single platform and maintain it all in a readily accessible state, with virtually unlimited cloud data storage capacity. Now there are a few ways to ingest data into Snowflake. Security sources can be ingested directly through native means such as streaming, stages, syslog, native connectors or secure data sharing. Snowflake’s Snowpipe service helps bring in new data easily, at a price that is tailored to an organization’s needs. The most common method is Snowpipe auto ingest, which works for security teams who regularly ingest machine data. But this method isn’t for everyone because loading small amounts of data slowly or many small files can cost more than other options. Snowpipe Streaming is another method that can save security teams money. With Snowpipe Streaming there’s no need to prepare files before loading, making the cost of getting data more predictable. Security teams can also reduce their costs by loading certain datasets in batches instead of continuously. For example, they could load a lot of data that isn’t needed for instant detection three times a day instead of constantly streaming that data, which can lead to more significant savings. Data retention: Many legacy SIEMS delete activity logs, transaction records, and other details from their systems after a few days, weeks or months. With Snowflake, security teams don’t have to work around these data retention windows. Instead, all data is always accessible for analysis, which simplifies cost planning and the data management strategy. It also provides more reliable generation of key security metrics such as visibility coverage, SLA performance, mean time to detect (MTTD) and mean-time-to-respond (MTTR). Snowflake also helps security teams save time by automatically compressing and encrypting the data, making it ready to query. Detection and investigation processing: Security teams depend on detection rules to find important events automatically. These rules need computing power to analyze data and spot attacks. In the cloud, computing can be measured in various ways, like bytes scanned or CPU cycles. This affects how much it costs and how predictable the costs are for processing detections. While computing costs might not have been a concern with fixed hardware in the past, it’s a whole new game in the cloud. For security teams, investigations require computational power to analyze collected data similar to running detections. Some solutions utilize different engines, such as stream or batch processing, for detections and investigations, while others employ the same engine for both tasks. Snowflake helps security teams understand how the query engine functions at a basic level, which helps them effectively plan for the cost estimates of their investigations. Moving away from volume ingest-based pricing A traditional SIEM typically manages all the data ingestion, transformation, detection and investigation processing for security teams. While out-of-the-box connectors and normalization can be useful, customers end up paying more by the nature of legacy SIEMs that use ingest volume-based pricing models. It’s important here to understand how this pricing model works. Ingest volume-based pricing can vary among the different legacy SIEM vendors but the basic principle remains the same: the more data security teams send to the SIEM for analysis, the higher the cost. By moving away from traditional volume-based pricing models, security teams can gain more control of what logs they have access to and how much they are spending. A consumption-based pricing model, like Snowflake’s, allows security teams to have all the data on hand while paying for only the compute resources they use, making security more cost-effective. Snowflake’s pricing model is designed to offer flexibility and scalability, giving security teams the ability to only pay for the resources they use without being tied to long-term contracts or upfront commitments. How Snowflake Works An open-architecture deployment with a modern security data lake, and best-of-breed applications from Snowflake, can keep costs down while improving an organization’s security posture. A security data lake eliminates data silos by removing limits on ingest and retention. Organizations can use a security data lake to scale resources up and down automatically and only pay for the resources they use — potentially controlling their costs without compromising their security. Security data lakes can also help analysts apply complex detection logic and security policies to log data and security tool output. Security analysts can quickly join security logs with contextual data sets, such as asset inventory, user details, configuration details, and other information, to eliminate would-be false positives, and identify stealthy threats. The value proposition is clear: organizations can consolidate their security data affordably and gain the flexibility to query that data at any time. Snowflake empowers organizations to make data-driven choices for long-term gain. We’ll dive into some customer success stories to show the potential of this approach. Real customer success stories If done right, Snowflake customers can experience remarkable cost savings. Let’s take a closer look at some notable success stories across various industries. At Comcast, Snowflake’s security data lake is now an integral component of their security data fabric. Instead of employees managing on-premises infrastructure, the Comcast security data lake built on Snowflake’s elastic engine in the cloud stores over 10 petabytes (PBs) of data with hot retention for over a year, saving millions of dollars. Automated sweeps of over 50,000 indicators of compromise (IOCs) across the 10-PB security data lake can now be completed in under 30 minutes. Guild Education can claim “up to 50% cost savings” working with Snowflake and is just one example that highlights the potentially significant financial benefits organizations can unlock with the Snowflake Data Cloud. By adopting Snowflake as its data lake for security events, corporate travel management company Navan achieved a best-of-breed security architecture that is both cost-efficient and cutting-edge. The results are impressive: Over 70% cost savings by adopting a modern SIEM-less architecture 15K+ hours saved in 8 months 4x improvements in MITRE ATT&CK coverage in 8 months Ready to witness the transformative power of Snowflake? Watch our demo and discover how you can revolutionize your data management strategy, unlock substantial cost savings, and propel your organization into a new era of efficiency and innovation. Learn how you can augment your Splunk strategy with Snowflake today. The post How to Navigate the Costs of Legacy SIEMS with Snowflake appeared first on Snowflake. View the full article
  2. Amazon Data Firehose (Firehose) now offers direct integration with Snowflake Snowpipe Streaming. Firehose enables customers to reliably capture, transform, and deliver data streams into Amazon S3, Amazon Redshift, Splunk, and other destinations for analytics. With this new feature, customers can stream clickstream, application, and AWS service logs from multiple sources, including Kinesis Data Streams, to Snowflake. With a few clicks, customers can setup a Firehose stream to deliver data to Snowflake. Firehose automatically scales to stream gigabytes of data, and records are available in Snowflake within seconds. View the full article
  3. In today’s data-driven world, developer productivity is essential for organizations to build effective and reliable products, accelerate time to value, and fuel ongoing innovation. To deliver on these goals, developers must have the ability to manipulate and analyze information efficiently. Yet while SQL applications have long served as the gateway to access and manage data, Python has become the language of choice for most data teams, creating a disconnect. Recognizing this shift, Snowflake is taking a Python-first approach to bridge the gap and help users leverage the power of both worlds. Our previous Python connector API, primarily available for those who need to run SQL via a Python script, enabled a connection to Snowflake from Python applications. This traditional SQL-centric approach often challenged data engineers working in a Python environment, requiring context-switching and limiting the full potential of Python’s rich libraries and frameworks. Since the previous Python connector API mostly communicated via SQL, it also hindered the ability to manage Snowflake objects natively in Python, restricting data pipeline efficiency and the ability to complete complex tasks. Snowflake’s new Python API (in public preview) marks a significant leap forward, offering a more streamlined, powerful solution for using Python within your data pipelines — and furthering our vision to empower all developers, regardless of experience, with a user-friendly and approachable platform. A New Era: Introducing Snowflake’s Python API With the new Snowflake Python API, readily available through pip install snowflake, developers no longer need to juggle between languages or grapple with cumbersome syntax. They can effortlessly leverage the power of Python for a seamless, unified experience across Snowflake workloads encompassing data engineering, Snowpark, machine learning and application development. This API is a testament to Snowflake’s commitment to a Python-first approach, offering a plethora of features designed to streamline workflows and enhance developer productivity. Key benefits of the new Snowflake Python API include: Simplified syntax and intuitive API design: Featuring a Pythonic design, the API is built on the foundation of REST APIs, which are known for their clarity and ease of use. This allows developers to interact with Snowflake objects naturally and efficiently, minimizing the learning curve and reducing development time. Rich functionality and support for advanced operations: The API goes beyond basic operations, offering comprehensive functionality for managing various Snowflake resources and performing complex tasks within your Python environment. This empowers developers to maximize the full potential of Snowflake through intuitive REST API calls. Enhanced performance and improved scalability: Designed with performance in mind, the API leverages the inherent scalability of REST APIs, enabling efficient data handling and seamless scaling to meet your growing data needs. This allows your applications to handle large data sets and complex workflows efficiently. Streamlined integration with existing tools and frameworks: The API seamlessly integrates with popular Python data science libraries and frameworks, enabling developers to leverage their existing skill sets and workflows effectively. This integration allows developers to combine the power of Python libraries with the capabilities of Snowflake through familiar REST API structures. By prioritizing the developer experience and offering a comprehensive, user-friendly solution, Snowflake’s new Python API paves the way for a more efficient, productive and data-driven future. Getting Started with the Snowflake Python API Our Quickstart guide makes it easy to see how the Snowflake Python API can manage Snowflake objects. The API allows you to create, delete and modify tables, schemas, warehouses, tasks and much more. In this Quickstart, you’ll learn how to perform key actions — from installing the Snowflake Python API to retrieving object data and managing Snowpark Container Services. Dive in to experience how the enhanced Python API streamlines your data workflows and unlocks the full potential of Python within Snowflake. To get started, explore the comprehensive API documentation, which will guide you through every step. We recommend that Python developers prioritize the new API for data engineering tasks since it offers a more intuitive and efficient approach compared to the legacy SQL connector. While the Python API connector remains available for specific SQL use cases, the new API is designed to be your go-to solution. By general availability, we aim to achieve feature parity, empowering you to complete 100% of your data engineering tasks entirely through Python. This means you’ll only need to use SQL commands if you truly prefer them or for rare unsupported functionalities. The New Wave of Native DevOps on Snowflake The Snowflake Python API release is among a series of native DevOps tools becoming available on the Snowflake platform — all of which aim to empower developers of every experience level with a user-friendly and approachable platform. These benefits extend far beyond the developer team. The 2023 Accelerate State of DevOps Report, the annual report from Google Cloud’s DevOps Research and Assessment (DORA) team, reveals that a focus on user-centricity around the developer experience leads to a 40% increase in organizational performance. With intuitive tools for data engineers, data scientists and even citizen developers, Snowflake strives to enhance these advantages by fostering collaboration across your data and delivery teams. By offering the flexibility and control needed to build unique applications, Snowflake aims to become your one-stop shop for data — minimizing reliance on third-party tools for core development lifecycle use cases and ultimately reducing your total cost of ownership. We’re excited to share more innovations soon, making data even more accessible for all. For a deeper dive into Snowflake’s Python API and other native Snowflake DevOps features, register for the Snowflake Data Cloud Summit 2024. Or, experience these features firsthand at our free Dev Day event on June 6th in the Demo Zone. The post Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease appeared first on Snowflake. View the full article
  4. Today’s fast-paced world demands timely insights and decisions, which is driving the importance of streaming data. Streaming data refers to data that is continuously generated from a variety of sources. The sources of this data, such as clickstream events, change data capture (CDC), application and service logs, and Internet of Things (IoT) data streams are proliferating. Snowflake offers two options to bring streaming data into its platform: Snowpipe and Snowflake Snowpipe Streaming. Snowpipe is suitable for file ingestion (batching) use cases, such as loading large files from Amazon Simple Storage Service (Amazon S3) to Snowflake. Snowpipe Streaming, a newer feature released in March 2023, is suitable for rowset ingestion (streaming) use cases, such as loading a continuous stream of data from Amazon Kinesis Data Streams or Amazon Managed Streaming for Apache Kafka (Amazon MSK). Before Snowpipe Streaming, AWS customers used Snowpipe for both use cases: file ingestion and rowset ingestion. First, you ingested streaming data to Kinesis Data Streams or Amazon MSK, then used Amazon Data Firehose to aggregate and write streams to Amazon S3, followed by using Snowpipe to load the data into Snowflake. However, this multi-step process can result in delays of up to an hour before data is available for analysis in Snowflake. Moreover, it’s expensive, especially when you have small files that Snowpipe has to upload to the Snowflake customer cluster. To solve this issue, Amazon Data Firehose now integrates with Snowpipe Streaming, enabling you to capture, transform, and deliver data streams from Kinesis Data Streams, Amazon MSK, and Firehose Direct PUT to Snowflake in seconds at a low cost. With a few clicks on the Amazon Data Firehose console, you can set up a Firehose stream to deliver data to Snowflake. There are no commitments or upfront investments to use Amazon Data Firehose, and you only pay for the amount of data streamed. Some key features of Amazon Data Firehose include: Fully managed serverless service – You don’t need to manage resources, and Amazon Data Firehose automatically scales to match the throughput of your data source without ongoing administration. Straightforward to use with no code – You don’t need to write applications. Real-time data delivery – You can get data to your destinations quickly and efficiently in seconds. Integration with over 20 AWS services – Seamless integration is available for many AWS services, such as Kinesis Data Streams, Amazon MSK, Amazon VPC Flow Logs, AWS WAF logs, Amazon CloudWatch Logs, Amazon EventBridge, AWS IoT Core, and more. Pay-as-you-go model – You only pay for the data volume that Amazon Data Firehose processes. Connectivity – Amazon Data Firehose can connect to public or private subnets in your VPC. This post explains how you can bring streaming data from AWS into Snowflake within seconds to perform advanced analytics. We explore common architectures and illustrate how to set up a low-code, serverless, cost-effective solution for low-latency data streaming. Overview of solution The following are the steps to implement the solution to stream data from AWS to Snowflake: Create a Snowflake database, schema, and table. Create a Kinesis data stream. Create a Firehose delivery stream with Kinesis Data Streams as the source and Snowflake as its destination using a secure private link. To test the setup, generate sample stream data from the Amazon Kinesis Data Generator (KDG) with the Firehose delivery stream as the destination. Query the Snowflake table to validate the data loaded into Snowflake. The solution is depicted in the following architecture diagram. Prerequisites You should have the following prerequisites: An AWS account and access to the following AWS services: AWS Identity and Access Management (IAM) Kinesis Data Streams Amazon S3 Amazon Data Firehose Familiarity with the AWS Management Console. A Snowflake account. A key pair generated and your user configured to connect securely to Snowflake. For instructions, refer to the following: Generate the private key Generate a public key Store the private and public keys securely Assign the public key to a Snowflake user Verify the user’s public key fingerprint An S3 bucket for error logging. The KDG set up. For instructions, refer to Test Your Streaming Data Solution with the New Amazon Kinesis Data Generator. Create a Snowflake database, schema, and table Complete the following steps to set up your data in Snowflake: Log in to your Snowflake account and create the database: create database adf_snf; Create a schema in the new database: create schema adf_snf.kds_blog; Create a table in the new schema: create or replace table iot_sensors (sensorId number, sensorType varchar, internetIP varchar, connectionTime timestamp_ntz, currentTemperature number ); Create a Kinesis data stream Complete the following steps to create your data stream: On the Kinesis Data Streams console, choose Data streams in the navigation pane. Choose Create data stream. For Data stream name, enter a name (for example, KDS-Demo-Stream). Leave the remaining settings as default. Choose Create data stream. Create a Firehose delivery stream Complete the following steps to create a Firehose delivery stream with Kinesis Data Streams as the source and Snowflake as its destination: On the Amazon Data Firehose console, choose Create Firehose stream. For Source, choose Amazon Kinesis Data Streams. For Destination, choose Snowflake. For Kinesis data stream, browse to the data stream you created earlier. For Firehose stream name, leave the default generated name or enter a name of your preference. Under Connection settings, provide the following information to connect Amazon Data Firehose to Snowflake: For Snowflake account URL, enter your Snowflake account URL. For User, enter the user name generated in the prerequisites. For Private key, enter the private key generated in the prerequisites. Make sure the private key is in PKCS8 format. Do not include the PEM header-BEGIN prefix and footer-END suffix as part of the private key. If the key is split across multiple lines, remove the line breaks. For Role, select Use custom Snowflake role and enter the IAM role that has access to write to the database table. You can connect to Snowflake using public or private connectivity. If you don’t provide a VPC endpoint, the default connectivity mode is public. To allow list Firehose IPs in your Snowflake network policy, refer to Choose Snowflake for Your Destination. If you’re using a private link URL, provide the VPCE ID using SYSTEM$GET_PRIVATELINK_CONFIG: select SYSTEM$GET_PRIVATELINK_CONFIG(); This function returns a JSON representation of the Snowflake account information necessary to facilitate the self-service configuration of private connectivity to the Snowflake service, as shown in the following screenshot. For this post, we’re using a private link, so for VPCE ID, enter the VPCE ID. Under Database configuration settings, enter your Snowflake database, schema, and table names. In the Backup settings section, for S3 backup bucket, enter the bucket you created as part of the prerequisites. Choose Create Firehose stream. Alternatively, you can use an AWS CloudFormation template to create the Firehose delivery stream with Snowflake as the destination rather than using the Amazon Data Firehose console. To use the CloudFormation stack, choose Generate sample stream data Generate sample stream data from the KDG with the Kinesis data stream you created: { "sensorId": {{random.number(999999999)}}, "sensorType": "{{random.arrayElement( ["Thermostat","SmartWaterHeater","HVACTemperatureSensor","WaterPurifier"] )}}", "internetIP": "{{internet.ip}}", "connectionTime": "{{date.now("YYYY-MM-DDTHH:m:ss")}}", "currentTemperature": {{random.number({"min":10,"max":150})}} } Query the Snowflake table Query the Snowflake table: select * from adf_snf.kds_blog.iot_sensors; You can confirm that the data generated by the KDG that was sent to Kinesis Data Streams is loaded into the Snowflake table through Amazon Data Firehose. Troubleshooting If data is not loaded into Kinesis Data Steams after the KDG sends data to the Firehose delivery stream, refresh and make sure you are logged in to the KDG. If you made any changes to the Snowflake destination table definition, recreate the Firehose delivery stream. Clean up To avoid incurring future charges, delete the resources you created as part of this exercise if you are not planning to use them further. Conclusion Amazon Data Firehose provides a straightforward way to deliver data to Snowpipe Streaming, enabling you to save costs and reduce latency to seconds. To try Amazon Kinesis Firehose with Snowflake, refer to the Amazon Data Firehose with Snowflake as destination lab. About the Authors Swapna Bandla is a Senior Solutions Architect in the AWS Analytics Specialist SA Team. Swapna has a passion towards understanding customers data and analytics needs and empowering them to develop cloud-based well-architected solutions. Outside of work, she enjoys spending time with her family. Mostafa Mansour is a Principal Product Manager – Tech at Amazon Web Services where he works on Amazon Kinesis Data Firehose. He specializes in developing intuitive product experiences that solve complex challenges for customers at scale. When he’s not hard at work on Amazon Kinesis Data Firehose, you’ll likely find Mostafa on the squash court, where he loves to take on challengers and perfect his dropshots. Bosco Albuquerque is a Sr. Partner Solutions Architect at AWS and has over 20 years of experience working with database and analytics products from enterprise database vendors and cloud providers. He has helped technology companies design and implement data analytics solutions and products. View the full article
  5. Huge performance-boosting opportunities await those who choose the optimal data warehouse for their business. Identifying custom data points that steer your organizations’ successful outcomes is crucial. Decision-making is optimized through sophisticated means of accessing and analyzing your company’s data. As the use of data warehouses grows exponentially, consumer choices become additionally more challenging to discern […]View the full article
  6. With RudderStack, you can build your customer data platform on top of Snowflake and keep control of your data.View the full article
  7. In 2020, Snowflake announced a new global competition to recognize the work of early-stage startups building their apps — and their businesses — on Snowflake, offering up to $250,000 in investment as the top prize. Four years later, the Snowflake Startup Challenge has grown into a premiere showcase for emerging startups, garnering interest from companies in over 100 countries and offering a prize package featuring a portion of up to $1 million in potential investment opportunities and exclusive mentorship and marketing opportunities from NYSE. This year’s entries presented an impressively diverse set of use cases. The list of Top 10 semi-finalists is a perfect example: we have use cases for cybersecurity, gen AI, food safety, restaurant chain pricing, quantitative trading analytics, geospatial data, sales pipeline measurement, marketing tech and healthcare. Just as varied was the list of Snowflake tech that early-stage startups are using to drive their innovative entries. Snowflake Native Apps (generally available on AWS and Azure, private preview on GCP) and Snowpark Container Services (currently in public preview) were exceptionally popular, which speaks to their flexibility, ease of use and business value. In fact, 8 of the 10 startups in our semi-finalist list plan to use one or both of these technologies in their offerings. We saw a lot of interesting AI/ML integrations and capabilities plus the use of Dynamic Tables (currently in public preview), UDFs and stored procedures, Streamlit, and Streamlit in Snowflake. Many entries also used Snowpark, taking advantage of the ability to work in the code they prefer to develop data pipelines, ML models and apps, then execute in Snowflake. Our sincere thanks go out to everyone who participated in this year’s competition. We recognize the amount of work involved in your entries, and we appreciate every submission. Let’s meet the 10 companies competing for the 2024 Snowflake Startup Challenge crown! BigGeo BigGeo accelerates geospatial data processing by optimizing performance and eliminating challenges typically associated with big data. Built atop BigGeo’s proprietary Volumetric and Surface-Level Discrete Global Grid System (DGGS), which manages surface-level, subsurface and aerial data, BigGeo Search allows you to perform geospatial queries against large geospatial data sets and high speeds. Capable of a headless deployment into Snowpark Container Services, BigGeo can be used to speed up queries of data stored in Snowflake, gather those insights into a dashboard, visualize them on a map, and more. Implentio Implentio is a centralized tool that helps ecommerce ops and finance teams efficiently and cost-effectively manage fulfillment and logistics spending. The solution ingests, transforms and centralizes large volumes of operations data from disparate systems and applies AI and ML to deliver advanced optimizations, insights and analyses that help teams improve invoice reconciliation and catch 3PL/freight billing errors. Innova-Q Focusing on food safety and quality, Innova-Q’s Quality Performance Forecast Application delivers near real-time insights into product and manufacturing process performance so companies can assess and address product risks before they affect public safety, operational effectiveness or direct costs. The Innova-Q dashboard provides access to product safety and quality performance data, historical risk data, and analysis results for proactive risk management. Leap Metrics Leap Metrics is a SaaS company that seeks to improve health outcomes for populations with chronic conditions while reducing the cost of care. Their analytics-first approach to healthcare leverages AI-powered insights and workflows through natively integrated data management, analytics and care management solutions. Leap Metrics’ Sevida platform unifies actionable analytics and AI with intelligent workflows tailored for care teams for an intuitive experience. Quilr Quilr’s adaptive protection platform uses AI and the principle of human-centric security to reduce incidents caused by human errors, unintentional insiders and social engineering. It provides proactive assistance to employees before they perform an insecure action, without disrupting business workflow. Quilr also gives organizations visibility into their Human Risk Posture to better understand what risky behaviors their users are performing, and where they have process or control gaps that could result in breaches. Scientific Financial Systems Beating the market is the driving force for investment management firms — but beating the market is not easy. SFS’s Quotient provides a unique set of analytics tools based on data science and ML best practices that rapidly analyzes large amounts of data and enables accurate data calculations at scale, with full transparency into calculation details. Quotient automates data management, time-series operations and production so investment firms can focus on idea generation and building proprietary alpha models to identify market insights and investment opportunities. SignalFlare.ai by Extropy 360 Pricing and analytics for chain restaurants is the primary focus of SignalFlare.ai, a decision intelligence solution that combines ML models for price optimization and risk simulation with geospatial expertise. Restaurants can use SignalFlare to refine and analyze customer and location data so they can better capture price opportunities and drive customer visits. Stellar Stellar is designed to make generative AI easy for Snowflake customers. It deploys gen AI components as containers on Snowpark Container Services, close to the customer’s data. Stellar Launchpad gives customers a conversational way to analyze and synthesize structured and unstructured data to power AI initiatives, making it possible to deploy multiple gen AI apps and virtual assistants to meet the demand for AI-driven business outcomes. Titan Systems Titan helps enterprises to manage, monitor and scale secure access to data in Snowflake with an infrastructure-as-code approach. Titan Core analyzes each change to your Snowflake account and evaluates them against a set of security policies, then rejects changes that are out of compliance to help catch data leaks before they happen. Vector Vector is a relationship intelligence platform that alerts sellers when they can break through the noise by detecting existing relationships between target accounts and happy customers, execs and investors. Vector can infer who knows whom and their connections by analyzing terabytes of contact, business, experience and IP data to determine digital fingerprints, attributes and shared experiences. What’s next: Preparing the perfect pitch In Round 2, each of these semi-finalists will create an investor pitch video, and their leadership team will be interviewed by the judges to discuss the company’s entry, the product and business strategy, and what the company would do with an investment should it win the 2024 Snowflake Startup Challenge. Based on this information, the judges will select three finalists, to be announced in May. Those three companies will present to our esteemed judging panel — Benoit Dageville, Snowflake Co-Founder and President of Product; Denise Persson, Snowflake CMO; Lynn Martin, NYSE Group President; and Brad Gerstner, Altimeter Founder and CEO — during the Startup Challenge Finale at Dev Day in San Francisco on June 6. The judges will ask questions and deliberate live before naming the 2024 Grand Prize winner. Register for Dev Day now to see the live finale and experience all of the developer-centric demos and sessions, discussions, expert Q&As and hands-on labs designed to set you up for AI/ML and app dev success. Congratulations to all of the semi-finalists, and best of luck in the next round! The post Snowflake Startup Challenge 2024: Announcing the 10 Semi-Finalists appeared first on Snowflake. View the full article
  8. Snowflake names RudderStack One to Watch in the Analytics Category category of their annual Modern Marketing Data Stack Report. View the full article
  9. Salesforce’s Customer Data Platform, Genie, relies on open data sharing with Snowflake. Does this signal a paradigm shift for the Customer 360? View the full article
  10. Data analytics involves storing, managing, and processing data from different sources and analyzing it thoroughly to develop solutions for our business problems. While JSON helps to interchange data between different web applications and sources through API connectors, Snowflake assists you in analyzing that data with its intuitive features. Therefore, JSON Snowflake data migration is crucial […]View the full article
  11. The most in-depth guide to Snowflake pricing - take a look in this technical deep dive. View the full article
  12. Data transformation is the process of converting data from one format to another, the “T” in ELT, or extract, load, transform, which enables organizations to get their data analytics-ready and derive insights and value from it. As companies collect more data, from disparate sources and in disparate formats, building and managing transformations has become exponentially more complex and time-consuming. The Snowflake Data Cloud includes powerful capabilities for transforming data and orchestrating data pipelines, and we partner with best-in-class providers to give customers a choice in the data transformation technologies they use. Today, we are excited to announce that Snowflake Ventures is investing in our partner, Coalesce, which offers an intuitive, low-code transformation platform for developing and managing data pipelines. The Coalesce platform is uniquely built for Snowflake. Coalesce allows data teams to build complex transformations quickly and efficiently without deep coding expertise, while still providing all the extensibility the most technical Snowflake users will need. This expands the number of users who can contribute to data projects and enhances collaboration. Coalesce automatically generates Snowflake-native SQL and supports Snowflake data engineering features such as Snowpark, Dynamic Tables, AI/ML capabilities, and more. Our investment helps Coalesce to continue providing first-class experiences for Snowflake users, including integrating closely to take advantage of the latest Data Cloud innovations. Coalesce will also lean into Snowpark Container Services and the Snowflake Native App Framework to provide a seamless user experience. With Snowflake Native Apps, customers can instantly deploy Coalesce on their Snowflake account and transact directly through Snowflake Marketplace. Our goal at Snowflake is to provide developers, data engineers, and other users with optimal choice in the tools they use to prepare and manage data. We will continue to add new transformation capabilities to the Data Cloud and look forward to working with Coalesce to provide the best possible experience for transforming data so organizations can unlock the full potential of their data. The post Snowflake Ventures Invests in Coalesce to Enable Simplified Data Transformation Development and Management Natively on the Data Cloud appeared first on Snowflake. View the full article
  13. In December 2023, Snowflake announced its acquisition of data clean room technology provider Samooha. Samooha’s intuitive UI and focus on reducing the complexity of sharing data led to it being named one of the most innovative data science companies of 2024 by Fast Company. Now, Samooha’s offering is integrated into Snowflake and launched as Snowflake Data Clean Rooms, a Snowflake Native App on Snowflake Marketplace, generally available to customers in AWS East, AWS West and Azure West. Snowflake Data Clean Rooms make it easy to build and use data clean rooms in Snowflake, with no additional access fees set by Snowflake. What is a data clean room? Data clean rooms provide a controlled environment that allows multiple companies, or divisions of a company, to securely collaborate on sensitive or regulated data while fully preserving the privacy of the enterprise data. Enterprises should not have to make challenging trade-offs between following compliance regulations and making sensitive data available for collaboration. With data clean rooms, organizations have an opportunity to unlock the value of sensitive data by allowing for joint data analytics, machine learning and AI by anonymizing, processing and storing personally identifiable information (PII) in a compliant way. Data clean rooms allow for multiple parties to securely collaborate on sensitive or regulated data, surfacing valuable insights while preserving the privacy of the data. How does a data clean room work? Data clean rooms can be used to control the following: What data comes into the clean room How the data in the clean room can be joined to other data in the clean room What types of analytics each party can perform on the data What data, if any, can leave the clean room Any sensitive or regulated data, such as PII, that is loaded into the clean room is encrypted. The clean room provider has full control over the clean room environment, while approved partners can get a feed with anonymized data. Why Snowflake Data Clean Rooms? Until now, data clean room technology was generally deployed by large organizations with access to technical data privacy experts. Snowflake Data Clean Rooms remove the technical and financial barriers, allowing companies of all sizes to easily build, use and benefit from data clean rooms. Unlock value with data clean rooms easily and at no additional license cost Teams can stand up new data clean rooms quickly, easily and at no additional license fees through an app that is available on Snowflake Marketplace. Built for business and technical users alike, Snowflake Data Clean Rooms allow organizations to unlock value from data faster with industry-specific workflows and templates such as audience overlap, reach and frequency, last touch attribution and more. As a Snowflake Native App, Snowflake Data Clean Rooms makes it easy for technical and business users to build and use data clean rooms in Snowflake. Tap into the open and interoperable ecosystem of the Snowflake Data Cloud The Snowflake Data Cloud provides an open, neutral and interoperable data clean room ecosystem that allows organizations to collaborate with all their partners seamlessly, regardless of whether they have their own Snowflake accounts. Companies can also leverage turnkey third-party integrations and solutions for data enrichment, identity, activation and more across providers. Snowflake Data Clean Rooms allows you to collaborate with your partners seamlessly across regions and clouds thanks to Cross-Cloud Snowgrid (Snowflake Data Clean Rooms is currently available in AWS East/West and Azure West). It provides a cross-cloud technology layer that allows you to interconnect your business’ ecosystems across regions and clouds and operate at scale. Take advantage of Snowflake’s built-in privacy and governance features Unlock privacy-enhanced collaboration on your sensitive data through an app built on the Snowflake Native App Framework. By bringing the clean room solution to your data, Snowflake Data Clean Rooms removes the need for data to ever leave the governance, security and privacy parameters of Snowflake. By leveraging Snowpark for AI/ML, cryptographic compute support, differential privacy models, security attestation guarantees and more, Snowflake Data Clean Rooms helps companies maintain privacy while allowing for deeper analytical insight with business partners. You can easily integrate data and activation partners to realize use cases in marketing, advertising, and across other industries. Snowflake Data Clean Rooms is a Snowflake Native App that runs directly in your Snowflake account, eliminating the need to move or copy data out of the governance, security and privacy parameters of Snowflake. Data clean room use cases across industries Key use cases for data clean rooms are found in marketing, media and advertising. However, organizations across industries are realizing value with data clean rooms, including financial services and healthcare and life sciences. Attribution for advertising and marketing One popular use case for data clean rooms is to link anonymized marketing and advertising data from multiple parties for attribution. Suppose a company has its own first-party data containing attributes about its customers and their associated sales SKUs. In that case, the company can use a data clean room to improve audience insights for advertising. Let’s say the company wants to find new customers with the same attributes as its best customers, and combine those attributes with other characteristics to drive upsell opportunities. To create the target segments and comply with privacy requirements, the company uploads its data into a clean room that it creates or is shared by its ad partner. Participants can securely join any first-party data without exposing IDs. Without a data clean room, only limited amounts of data could flow between the various parties due to data privacy, regulations and competitive concerns. Measurement for advertising and marketing Another key data clean room use case is the measurement of the effectiveness of advertising and marketing campaigns. Advertisers want to understand who saw an advertisement, for example, as well as who engaged with it. This information will be distributed across the different media partners it takes to serve an ad to a consumer. Creating a joint analysis across the data of these different media partners is important for advertisers to understand campaign results and to optimize future campaigns. Such measurement can only be realized through a data clean room as it protects the sensitivity of the consumer data across all parties while surfacing valuable analytical insights. Monetizing proprietary data The omnichannel customer journey is complex, and it rarely starts with a brand’s advertisement. For example, if a consumer is planning an upcoming purchase of a kitchen appliance, the journey is likely to start with online review sites. A reviews site collects top-of-funnel data that would be invaluable to the appliance brand. With a data clean room, the reviews website could create a compliant third-party data product, manage access to it through the clean room, and monetize it. Consumer goods-retail collaboration Data clean rooms allow retailers and consumer goods companies to collaborate with brands that advertise with them. For example, a retailer can share transaction data in a privacy- and governance-friendly manner to provide insights into conversion signals and enable better targeting, personalization and attribution. Enhancing financial service customer data Similar to use cases in marketing, data clean rooms enable financial institutions to securely collaborate across a variety of use cases like credit fraud modeling and money laundering. Sensitive financial consumer data can be enhanced with second and third-party data sources and analyzed across institutional boundaries to detect anomalous patterns and behaviors, all while protecting consumer data privacy. Enriching patient health data In healthcare and life sciences, a hospital can use data clean rooms to share regulated patient data with a pharmaceutical company. The company can enrich and analyze the data to identify patterns in patient outcomes across clinical trials. The data clean room environment enables the patient data to remain private while still contributing to meaningful insights. Learn more about Snowflake Data Clean Rooms Get started today with Snowflake Data Clean Rooms: visit the listing on Snowflake Marketplace for additional details. To see a demo of Snowflake Data Clean Rooms, register for Snowflake’s virtual Accelerate Advertising, Media, & Entertainment event and learn how media and advertising organizations collaborate in the Media Data Cloud to enhance business growth and data monetization, develop new products, and harness the power of AI and ML. The post Snowflake Data Clean Rooms: Securely Collaborate to Unlock Insights and Value appeared first on Snowflake. View the full article
  14. As organizations seek to drive more value from their data, observability plays a vital role in ensuring the performance, security and reliability of applications and pipelines while helping to reduce costs. At Snowflake, we aim to provide developers and engineers with the best possible observability experience to monitor and manage their Snowflake environment. One of our partners in this area is Observe, which offers a SaaS observability product that is built and operated on the Data Cloud. We’re excited to announce today that Snowflake Ventures is making an investment in Observe to significantly expand the observability experience we provide for our customers. Following the investment, Observe plans to develop best-in-class observability features that will help our customers monitor and manage their Snowflake environments even more effectively. Solutions such as out-of-the-box dashboards and new visualizations will empower developers and engineers to accelerate their work and troubleshoot problems more quickly and easily. In addition, because Observe is built on the Data Cloud, our customers will have the option to keep their observability data within their Snowflake account instead of sending it out to a third-party provider. This further simplifies and enhances their data governance by allowing them to keep more of their data within the secure environment of their Snowflake account. Observe is an example of how more companies are building and operating SaaS applications on the Data Cloud. By doing so, these companies gain access to our scalable infrastructure and powerful analytics while being able to offer a more advanced and differentiated experience to Snowflake customers. We will continue to expand the signals we provide for developers and engineers to manage, monitor and troubleshoot their workloads in the Data Cloud. Our partnerships with companies like Observe help turn signals into actionable insights that are presented in compelling and innovative ways. The post Snowflake Invests in Observe to Expand Observability in the Data Cloud appeared first on Snowflake. View the full article
  15. Snowflake is committed to helping our customers unlock the power of artificial intelligence (AI) to drive better decisions, improve productivity and reach more customers using all types of data. Large Language Models (LLMs) are a critical component of generative AI applications, and multimodal models are an exciting category that allows users to go beyond text and incorporate images and video into their prompts to get a better understanding of the context and meaning of the data. Today we are excited to announce we’re furthering our partnership with Reka to support its suite of highly capable multimodal models in Snowflake Cortex. This includes Flash, an optimized model for everyday questions and developing support for Core, Reka’s largest and most performant model. This will allow our customers to seamlessly unlock value from more types of data with the power of multimodal AI in the same environment where their data lives, protected by the built-in security and governance of the Snowflake Data Cloud. Reka’s latest testing reveals that both Flash and Core are highly capable with Core’s capabilities approaching GPT-4 and Gemini Ultra, making it one of the most capable LLMs available today. In addition to expanding our partnership with NVIDIA to power gen AI applications and enhance model performance and scalability, our partnership with Reka and other LLM providers are the latest examples of how Snowflake is accelerating our AI capabilities for customers. Snowflake remains steadfast in our commitment to make AI secure, easy to use and quick-to-implement, for both business and technical users. Taken together, our partnerships and investments in AI ensure we continue to provide customers with maximum choice around the tools and technologies they need to build powerful AI applications. The post Snowflake Brings Gen AI to Images, Video and More With Multimodal Language Models from Reka in Snowflake Cortex appeared first on Snowflake. View the full article
  16. Performance tuning in Snowflake is optimizing the configuration and SQL queries to improve the efficiency and speed of data operations. It involves adjusting various settings and writing queries to reduce execution time and resource consumption, ultimately leading to cost savings and enhanced user satisfaction. Performance tuning is crucial in Snowflake for several reasons: View the full article
  17. As Large Language Models are revolutionizing natural language prompts, Large Vision Models (LVMs) represent another new, exciting frontier for AI. An estimated 90% of the world’s data is unstructured, much of it in the form of visual content such as images and videos. Insights from analyzing this visual data can open up powerful new use cases that significantly boost productivity and efficiency, but enterprises need sophisticated computer vision technologies to achieve this. One of the leaders in this area is Landing AI, a company founded by globally recognized AI expert Andrew Ng. Landing AI provides an intuitive software platform that allows enterprises to leverage generative AI for computer vision technologies at scale, to uncover new insights and drive innovation from proprietary image and video data. Snowflake and Landing AI are already partners, and we’re excited to announce today that we are deepening our relationship with an investment in Landing AI. Through this next phase of partnership, customers will be able to leverage Landing AI’s computer vision capabilities natively on Snowflake via Snowpark Container Services and Snowflake Native Apps. This will enable the building of powerful, custom-built computer vision solutions that process images and videos at scale, all within the secure, governed boundary of the Data Cloud. The use cases for AI and computer vision are as varied as they are powerful. LandingLens is built from the ground up for enterprise use cases, with an intuitive workflow that allows data teams to fine-tune solutions for their specific industry needs. Manufacturers use computer vision for quality inspection, robotic assembly and defect detection. Retailers apply the technology to enable automated checkout and shelf inventory checks. The pharma industry employs computer vision for drug discovery, inspecting medications and cell analysis. In each case, computer vision and AI work in tandem to greatly increase efficiency and reduce costs. Through our investment in Landing AI, we are excited to further the promise of computer vision for our customers across industries. The post Snowflake Ventures Invests in Landing AI, Boosting Visual AI in the Data Cloud appeared first on Snowflake. View the full article
  18. Because human-machine interaction using natural language is now possible with large language models (LLMs), more data teams and developers can bring AI to their daily workflows. To do this efficiently and securely, teams must decide how they want to combine the knowledge of pre-trained LLMs with their organization’s private enterprise data in order to deal with the hallucinations (that is, incorrect responses) that LLMs can generate due to the fact that they’ve only been trained on data available up to a certain date. To reduce these AI hallucinations, LLMs can be combined with private data sets via processes that either don’t require LLM customization (such as prompt engineering or retrieval augmented generation) or that do require customization (like fine-tuning or retraining). To decide where to start, it is important to make trade-offs between the resources and time it takes to customize AI models and the required timelines to show ROI on generative AI investments. While every organization should keep both options on the table, to quickly deliver value, the key is to identify and deploy use cases that can deliver value using prompt engineering and retrieval augmented generation (RAG), as these can be fast and cost-effective approaches to get value from enterprise data with LLMs. To empower organizations to deliver fast wins with generative AI while keeping data secure when using LLMs, we are excited to announce Snowflake Cortex LLM functions are now available in public preview for select AWS and Azure regions. With Snowflake Cortex, a fully managed service that runs on NVIDIA GPU-accelerated compute, there is no need to set up integrations, manage infrastructure or move data outside of the Snowflake governance boundary to use the power of industry-leading LLMs from Mistral AI, Meta and more. So how does Snowflake Cortex make AI easy, whether you are doing prompt engineering or RAG? Let’s dive into the details and check out some code along the way. To prompt or not to prompt In Snowflake Cortex, there are task-specific functions that work out of the box without the need to define a prompt. Specifically, teams can quickly and cost-effectively execute tasks such as translation, sentiment analysis and summarization. All that an analyst or any other user familiar with SQL needs to do is point the specific function below to a column of a table containing text data and voila! Snowflake Cortex functions take care of the rest — no manual orchestration, data formatting or infrastructure to manage. This is particularly useful for teams constantly working with product reviews, surveys, call transcripts and other long-text data sources traditionally underutilized within marketing, sales and customer support teams. SELECT SNOWFLAKE.CORTEX.SUMMARIZE(review_text) FROM reviews_table LIMIT 10; Of course, there are going to be many use cases where customization via prompts becomes useful. For example: Custom text summaries in JSON format Turning email domains into rich data sets Building data quality agents using LLMs All of these and more can quickly be accomplished with the power of industry-leading foundation models from Mistral AI (Mistral Large, Mistral 8x7B, Mistral 7B), Google (Gemma-7b) and Meta (Llama2 70B). All of these foundation LLMs are accessible via the complete function, which just like any other Snowflake Cortex function can run on a table with multiple rows without any manual orchestration or LLM throughput management. Figure 1: Multi-task accuracy of industry-leading LLMs based on MLLU benchmark. Source SELECT SNOWFLAKE.CORTEX.COMPLETE( 'mistral-large', CONCAT('Summarize this product review in less than 100 words. Put the product name, defect and summary in JSON format: <review>', content, '</review>') ) FROM reviews LIMIT 10; For use cases such as chatbots on top of documents, it may be costly to put all the documents as context in the prompt. In such a scenario, a different approach may be more cost effective by minimizing the volume of tokens (a general rule of thumb is that 75 words approximately equals 100 tokens) going into the LLM. A popular framework to solve this problem without having to make changes to the LLM is RAG, which is easy to do in Snowflake. What is RAG? Let’s go over the basics of RAG before jumping into how to do this in Snowflake. RAG is a popular framework in which an LLM gets access to a specific knowledge base with the most up-to-date, accurate information available before generating a response. Because there is no need to retrain the model, this extends the capability of any LLM to specific domains in a cost-effective way. To deploy this retrieval, augmentation and generation framework teams need a combination of: Client / app UI: This is where the end user, such as a business decision-maker, is able to interact with the knowledge base, typically in the form of a chat service. Context repository: This is where relevant data sources are aggregated, governed and continuously updated as needed to provide an up-to-date knowledge repository. This content needs to be inserted into an automated pipeline that chunks (that is, breaks documents into smaller pieces) and embeds the text into a vector store. Vector search: This requires the combination of a vector store, which maintains the numerical or vector representation of the knowledge base, and semantic search to provide easy retrieval of the chunks most relevant to the question. LLM inference: The combination of these enables teams to embed the question and the context to find the most relevant information and generate contextualized responses using a conversational LLM. Figure 2: Generalized RAG framework from question to contextualized answer. From RAG to rich LLM apps in minutes with Snowflake Cortex Now that we understand how RAG works in general, how can we apply it to Snowflake? Using the Snowflake platform’s rich foundation for data governance and management, which includes vector data type (in private preview), developing and deploying an end-to-end AI app using RAG is possible without integrations, infrastructure management or data movement using three key features: Figure 3: Key Snowflake features needed to build end-to-end RAG in Snowflake. Here is how these features map to the key architecture components of a RAG framework: Client / app UI: Use Streamlit in Snowflake out-of-the box chat elements to quickly build and share user interfaces all in Python. Context repository: The knowledge repository can be easily updated and governed using Snowflake stages. Once documents are loaded, all of your data preparation, including generating chunks (smaller, contextually rich blocks of text), can be done with Snowpark. For the chunking in particular, teams can seamlessly use LangChain as part of a Snowpark User Defined Function. Vector search: Thanks to the native support of VECTOR as a data type in Snowflake, there is no need to integrate and govern a separate store or service. Store VECTOR data in Snowflake tables and execute similarity queries with system-defined similarity functions (L2, cosine, or inner-product distance). LLM inference: Snowflake Cortex completes the workflow with serverless functions for embedding and text completion inference (using either Mistral AI, Llama or Gemma LLMs). Figure 4: End-to-end RAG framework in Snowflake. Show me the code Ready to try Snowflake Cortex and its tightly integrated ecosystem of features that enable fast prototyping and agile deployment of AI apps in Snowflake? Get started with one of these resources: Snowflake Cortex LLM functions documentation Run 3 useful LLM inference jobs in 10 minutes with Snowflake Cortex Build a chat-with-your-documents LLM app using RAG with Snowflake Cortex To watch live demos and ask questions of Snowflake Cortex experts, sign up for one of these events: Snowflake Cortex Live Ask Me Anything (AMA) Snowflake Cortex RAG hands-on lab Want to network with peers and learn from other industry and Snowflake experts about how to use the latest generative AI features? Make sure to join us at Snowflake Data Cloud Summit in San Francisco this June! The post Easy and Secure LLM Inference and Retrieval Augmented Generation (RAG) Using Snowflake Cortex appeared first on Snowflake. View the full article
  19. Welcome to Snowflake’s Startup Spotlight, where we learn about companies building their businesses on Snowflake. In this edition, we’ll hear how Maria Marti, founder and CEO of ZeroError, used her experiences as an engineer and an executive to build a team and create the AI analytics assistant she always wanted — but never had. What inspires you as a founder? My team and my customers. The passion that my team puts into everything we do, and the looks in our customers’ eyes when they see what ZeroError can do for them — how it solves a real issue for them. Explain ZeroError in one sentence. ZeroError is an enterprise AI platform designed to detect errors and fraud in data. What problem does ZeroError aim to solve? How did you identify that issue? There are two critical moments when you are on a data-driven team: one, when you need to make decisions with data you received; and two, when data needs to leave the organization. There is no room for mistakes. When you are presenting to the Board, or talking with regulators, your data has to be perfect. Today, we solve this the same way we did 20 years ago — with a lot of manual processes. Current data catalogs and quality tools can take a long time to implement. The user needs to pre-define almost everything, and the tools generally don’t have the flexibility to adapt to a very dynamic data environment. That’s why we created ZeroError. ZeroError is the application that I needed in my past roles as an executive in Fortune 100 companies, but it did not exist. ZeroError applies the power of AI to help improve data quality and analytics. Our proprietary AI detects complex data anomalies without human input and without defining rules. It is an executive-centric application for critical and timely decision-making, reporting and controls. As a founder and innovator, what is your take on the rapidly changing AI landscape? We are so fortunate to live in this exciting time. I truly believe that AI is not here to replace humans, but to help us achieve more and make us more efficient. At our heart, we are an AI company. ZeroError is your AI assistant for analytics, and I think that’s super cool. We developed our proprietary AI and integrated other LLMs into our offering. We’re launching AI copilots focused on specific business solutions and customer needs. Our approach is different because we have a unique combination of knowing the issue we are addressing inside and out, the team’s years of experience as agents of change in large organizations, our superb tech skills, and loving what we are doing. Passion is everything! How is the AI surge changing what businesses expect from their platforms and tools? Expectations are definitely very high. In the case of ZeroError, I would say that businesses are surprised by what we can achieve with AI — but at the same time, they expect to be surprised. Many AI applications, especially those focused on business intelligence, “assume” that the data they receive is perfect. That is a huge assumption. Any good BI needs good data; you have to understand and trust your data quality to be able to trust the output of your apps. During Mobile World Congress 2024 in Barcelona, we got so much attention and traction because ZeroError is a platform that tells the user if they can trust the data. At the company’s Mobile World Congress 2024 booth, ZeroError Founder and CEO Maria Marti saw firsthand how businesses have high expectations for AI applications. How do you leverage Snowflake to push the envelope in your industry? Working with Snowflake has been a critical building block. Snowflake makes our business lives easier, allowing us to focus on what we are really good at. The Snowflake platform is tremendously intuitive, and gives us the flexibility to grow and to structure our Snowflake instance in a way that fits our changing needs. For a startup, time is everything: you need to go fast, iterate and continue to innovate. Snowflake has been key in accelerating our time to market. The cost transparency is critical for any startup, and we share the common passion of putting customers at the center of our businesses. What’s the most valuable piece of advice you got about how to run a startup? Always be true to yourself and remember why you created the company, especially in the difficult moments. That came from my father, who has been an entrepreneur all his life. What’s a lesson you learned the hard way? All my life I took mistakes as an opportunity to learn. I think that not taking risks is the biggest risk you can take. And of course if you try new things, you are going to make mistakes, but that is an opportunity to improve and to get better. I always tell my teams and partners: It is when something does not work that the true character of the partnership comes up and we have an opportunity to differentiate ourselves and become stronger as a team. Learn more about ZeroError at www.zeroerror.ai. If you’re a startup building on Snowflake, check out the Powered by Snowflake Startup Program for info on how Snowflake can support your startup goals. The post Snowflake Startup Spotlight: ZeroError appeared first on Snowflake. View the full article
  20. Welcome to Snowflake’s Startup Spotlight, where we learn about awesome companies building businesses on Snowflake. In this edition, find out how Angad Singh, co-founder and CEO of Chabi, is working to give every company the chance to become data-driven with a modern data stack. How would you explain Chabi? Chabi is your all-in-one data stack with state-of-the-art, built-in data warehouse, ETL, data modeling and personalized analytics that are tailored to meet your unique data and BI needs. Chabi also provides an extended data team that enables companies to fully realize the value of data-driven insights without needing an in-house data team. What prompted you to found Chabi? The entire founding team was part of multiple direct-to-consumer startups before starting Chabi. In all of these startups we built in-house data and analytics infrastructure that helped unlock step function improvements in growth and operational efficiency throughout the business. We knew businesses of all shapes and sizes look for — and could benefit from — an internal data stack like ours, but most businesses don’t have access to the capital, engineering and data expertise needed to build one. And frankly, a lot of businesses end up spending a lot of time and effort on standing up internal data and analytics stacks that become cost centers instead of driving growth and profitability. We knew we could bring a turnkey and highly cost-effective data stack to market that would enable a business to become data-driven with just a few clicks. What’s the story behind your company’s name? “Chabi” is a Hindi word that means a key. We thought that was a great fit for our company because our product unlocks all your data and becomes a personalized operating system for your business. What do you see as Chabi’s role in today’s changing data landscape? Businesses of all shapes and sizes have undergone a digital transformation. A direct result of that has been an order of magnitude increase in data volume and complexity that needs to be analyzed in order for a business to operate efficiently and grow. However, unless you are an enterprise-scale company, you likely don’t have access to modern data tooling and the technology resources required to implement it. Which means becoming a modern, data-driven organization remains out of reach. We want to change that. Chabi gives every company the ability to have access to a turnkey modern stack. We’re enabling companies across the globe — like a small restaurant chain in California, a VC-backed SaaS startup in Israel or a multimillion-dollar furniture marketplace in the U.S. — to become data-driven. How do you leverage Snowflake to help you to push the envelope in your line of business? Snowflake is one of the foundational building blocks of Chabi. Snowflake is the core primitive that powers every feature and capability of the Chabi platform. By abstracting away the complexity of syncing and transforming data from our customers, we are able to bring the cutting-edge capabilities of Snowflake to customers who are typically not even aware of such data technologies. This includes AI and ML capabilities enabled by Snowflake that we have been able to productize for our customers by offering out-of-the-box KPI forecasts and much more. What’s the most valuable piece of advice you got about how to run a startup? If you are in a position to build your product with customer feedback, lean in on it heavily. There is a bad reputation around “services”-based software platforms but if adding a services component helps to get closer to the customer, it is a true win-win. If you had a chance to go back in time and do something differently as a founder, what would you change? There is a discipline that comes from running a bootstrapped, cash flow positive software business. If I could go back in time to my prior startups, I would raise less funding than we did and use that as a forcing function to become profitable by necessity. If you’d like to learn more about Chabi and Angad’s founder journey, you can find Angad at the Shoptalk event in Las Vegas in March, or visit www.chabi.io. If you’re a startup building on Snowflake, check out the Powered by Snowflake Startup Program. The post Snowflake Startup Spotlight: Chabi appeared first on Snowflake. View the full article
  21. Data Engineering Tools in 2024 The data engineering landscape in 2024 is bustling with innovative tools and evolving trends. Here’s an updated perspective on some of the key players and how they can empower your data pipelines: Data Integration: Informatica Cloud: Still a leader for advanced data quality and governance, with enhanced cloud-native capabilities. MuleSoft Anypoint Platform: Continues to shine in building API-based integrations, now with deeper cloud support and security features. Fivetran: Expands its automated data pipeline creation with pre-built connectors and advanced transformations. Hevo Data: Remains a strong contender for ease of use and affordability, now offering serverless options for scalability. Data Warehousing: Snowflake: Maintains its edge in cloud-based warehousing, with improved performance and broader integrations for analytics. Google BigQuery: Offers even more cost-effective options for variable workloads, while deepening its integration with other Google Cloud services. Amazon Redshift: Continues to be a powerful choice for AWS environments, now with increased focus on security and data governance. Microsoft Azure Synapse Analytics: Further integrates its data warehousing, lake, and analytics capabilities, providing a unified platform for diverse data needs. Data Processing and Orchestration: Apache Spark: Remains the reigning champion for large-scale data processing, now with enhanced performance optimizations and broader ecosystem support. Apache Airflow: Maintains its popularity for workflow orchestration, with improved scalability and user-friendliness. Databricks: Expands its cloud-based platform for Spark with advanced features like AI integration and real-time streaming. AWS Glue: Simplifies data processing and ETL within the AWS ecosystem, now with serverless options for cost efficiency. Emerging Trends: GitOps: Gaining traction for managing data pipelines with version control and collaboration, ensuring consistency and traceability. AI and Machine Learning: Increasingly integrated into data engineering tools for automation, anomaly detection, and data quality improvement. Serverless Data Processing: Offering cost-effective and scalable options for event-driven and real-time data processing. Choosing the right tools: With this diverse landscape, selecting the right tools depends on your specific needs. Consider factors like: Data volume and complexity: Match tool capabilities to your data size and structure. Cloud vs. on-premises: Choose based on your infrastructure preferences and security requirements. Budget: Evaluate pricing models and potential costs associated with each tool. Integration needs: Ensure seamless compatibility with your existing data sources and BI tools. Skillset: Consider the technical expertise required for each tool and available support resources. By carefully evaluating your needs and exploring the strengths and limitations of these top contenders, you’ll be well-equipped to choose the data engineering tools that empower your organization to unlock valuable insights from your data in 2024. The post Data Engineering Tools in 2024 appeared first on DevOpsSchool.com. View the full article
  22. Docker, in collaboration with Snowflake, introduces an enhanced level of developer productivity when you leverage the power of Docker Desktop with Snowpark Container Services (private preview). At Snowflake BUILD, Docker presented a session showcasing the streamlined process of building, iterating, and efficiently managing data through containerization within Snowflake using Snowpark Container Services. Watch the session to learn more about how this collaboration helps streamline development and application innovation with Docker, and read on for more details. Docker Desktop with Snowpark Container Services helps empower developers, data engineers, and data scientists with the tools and insights needed to seamlessly navigate the intricacies of incorporating data, including AI/ML, into their workflows. Furthermore, the advancements in Docker AI within the development ecosystem promise to elevate GenAI development efforts now and in the future. Through the collaborative efforts showcased between Docker and Snowflake, we aim to continue supporting and guiding developers, data engineers, and data scientists in leveraging these technologies effectively. Accelerating deployment of data workloads with Docker and Snowpark Why is Docker, a containerization platform, collaborating with Snowflake, a data-as-a-service company? Many organizations lack formal coordination between data and engineering teams, meaning every change might have to go through DevOps, slowing project delivery. Docker Desktop and Snowpark Container Services (private preview) improve collaboration between developers and data teams. This collaboration allows data and engineering teams to work together, removing barriers to enable: Ownership by streamlining development and deployment Independence by removing traditional dependence on engineering stacks Efficiency by reducing resources and improving cross-team coordination With the growing number of applications that rely on data, Docker is invested in ensuring that containerization supports the changing development landscape to provide consistent value within your organization. Streamlining Snowpark deployments with Docker Desktop Docker Desktop provides many benefits to data teams, including improving data ingestion or enrichment and improving general workarounds when working with a data stack. Watch the video from Snowflake BUILD for a demo showing the power of Docker Desktop and Snowpark Container Services working together. We walk through: How to create a Docker Image using Docker Desktop to help you drive consistency by encapsulating your code, libraries, dependencies, and configurations in an image. How to push that image to a registry to make it portable and available to others with the correct permissions. How to run the container as a job in Snowpark Container Services to help you scale your work with versioning and distributed deployments. Using Docker Desktop with Snowpark Container Services provides an enhanced development experience for data engineers who can develop in one environment and deploy in another. For example, with Docker Desktop you can create on an Arm64 platform, yet deploy to Snowpark, an AMD64 platform. This functionality shows multi-platform images, so you can have a great local development environment and still deploy to Snowpark without any difficulty. Boosting developer productivity with Docker AI In alignment with Docker’s mission to increase the time developers spend on innovation and decrease the time they spend on everything else, Docker AI assists in streamlining the development lifecycle for both development and data teams. Docker AI, available in early access now, aims to simplify current tasks, boosting developer productivity by offering context-specific, automated guidance. When using Snowpark Container Services, deploying the project to Snowpark is the next step once you’ve built your image. Leveraging its trained model on Snowpark documentation, Docker AI offers relevant recommendations within your project’s context. For example, it autocompletes Docker files with best practice suggestions and continually updates recommendations as projects evolve and security measures change. This marks Docker’s initial phase of aiding the community’s journey in simplifying using big data and implementing context-specific AI guidance across the software development lifecycle. Despite the rising complexity of projects involving vast data sets, Docker AI provides support, streamlining processes and enhancing your experience throughout the development lifecycle. Docker AI aims to deliver tailored, automated advice during Dockerfile or Docker Compose editing, local docker build debugging, and local testing. Docker AI leverages the wealth of knowledge from the millions of long-time Docker users to autogenerate best practices and recommend secure, updated images. With Docker AI, developers can concentrate more on innovating their applications and less time on tools and infrastructure. Sign up for the Docker AI Early Access Program now. Improving the collaboration across development and data teams Our continued investment in Docker Desktop and Docker AI, along with our key collaborators like Snowflake, help you streamline the process of building, iterating, and efficiently managing data through containerization. Download Docker Desktop to get started today. Check with your admins — you may be surprised to find out your organization is already using Docker! Learn more Review Snowpark Container Services GitHub documentation. Follow the Snowflake tutorial to leverage your Snowflake data and build a Docker Image. Learn more about LLM and Hugging Face. Sign up for the Docker AI Early Access Program. View the full article
  23. Snowflake is announcing new product capabilities that are changing how developers build, deliver, distribute and operate their applications. These new features include programming language and hardware flexibility from Snowpark Container Services, as well as the ability to build, distribute and monetize full-stack apps with the Snowflake Native App Framework; the ability to leverage transactional and analytical data together with Hybrid Tables; and DevOps capabilities including database change management capabilities, Git integration, Snowflake CLI and Event Tables. These features collectively help developers build more quickly within a unified platform, distribute products globally, deliver them securely, and scale without operational burden... View the full article
  24. Snowflake has invested heavily in extending the Data Cloud to AI/ML workloads, starting in 2021 with the introduction of Snowpark, the set of libraries and runtimes in Snowflake that securely deploy and process Python and other popular programming languages. Since then, we’ve significantly opened up the ways Snowflake’s platform, including its elastic compute engine can be used to accelerate the path from AI/ML development to production. Since Snowpark takes advantage of that scale and performance of Snowflake’s logically integrated but physically separated storage and compute, our customers are seeing a median of 3.5 times faster performance and 34% lower costs for their AI/ML and data engineering use cases. As of September 2023, we’ve already seen many organizations benefit from bringing processing directly to the data, with over 35% of Snowflake customers using Snowpark on a weekly basis. To further accelerate the entire ML workflow from development to production, the Snowflake platform continues to evolve with a new development interface and more functionality to securely productionize both features and models. Let’s unpack these announcements! ... View the full article
  25. We are excited to announce the general availability of Snowflake Event Tables for logging and tracing, an essential feature to boost application observability and supportability for Snowflake developers. In our conversations with developers over the last year, we’ve heard that monitoring and observability are paramount to effectively develop and monitor applications. But previously, developers didn’t have a centralized, straightforward way to capture application logs and traces. Enter the new Event Tables feature, which helps developers and data engineers easily instrument their code to capture and analyze logs and traces for all languages: Java, Scala, JavaScript, Python and Snowflake Scripting. With Event Tables, developers can instrument logs and traces from their UDFs, UDTFs, stored procedures, Snowflake Native Apps and Snowpark Container Services, then seamlessly route them to a secure, customer-owned Event Table. Developers can then query Event Tables to troubleshoot their applications or gain insights into performance and code behavior. Logs and traces are collected and propagated via Snowflake’s telemetry APIs, then automatically ingested into your Snowflake Event Table. Logs and traces are captured in the active Event Table for the account. Simplify troubleshooting in Native Apps Event Tables are also supported for Snowflake Native Apps. When a Snowflake Native App runs, it is running in the consumer’s account, generating telemetry data that’s ingested into their active Event Table. Once the consumer enables event sharing, new telemetry data will be ingested into both the consumer and provider Event Tables. Now the provider has the ability to debug the application that’s running in the consumer’s account. The provider only sees the telemetry data that is being shared from this data application—nothing else. For native applications, events and logs are shared with the provider only if the consumer enables Event sharing. Improve reliability across a variety of use cases You can use Event Tables to capture and analyze logs for various use cases: As a data engineer building UDFs and stored procedures within queries and tasks, you can instrument your code to analyze its behavior based on input data. As a Snowpark developer, you can instrument logs and traces for your Snowflake applications to troubleshoot and improve their performance and reliability. As a Snowflake Native App provider, you can analyze logs and traces from various consumers of your applications to troubleshoot and improve performance. Snowflake customers ranging from Capital One to phData are already using Event Tables to unlock value in their organization. “The Event Tables feature simplifies capturing logs in the observability solution we built to monitor the quality and performance of Snowflake data pipelines in Capital One Slingshot,” says Yudhish Batra, Distinguished Engineer, Capital One Software. “Event Tables has abstracted the complexity associated with logging from our data pipelines—specifically, the central Event Table gives us the ability to monitor and alert from a single location.” As phData migrates its Spark and Hadoop applications to Snowpark, the Event Tables feature has helped architects save time and hassle. “When working with Snowpark UDFs, some of the logic can become quite complex. In some instances, we had thousands of lines of Java code that needed to be monitored and debugged,” says Nick Pileggi, Principal Solutions Architect at phData. “Before Event Tables, we had almost no way to see what was happening inside the UDF and correct issues. Once we rolled out Event Tables, the amount of time we spent testing dropped significantly and allowed us to have debug and info-level access to the logs we were generating in Java.” One large communications service provider also uses logs in Event Tables to capture and analyze failed records during data ingestion from various external services to Snowflake. And a Snowflake Native App provider offering geolocation data services uses Event Tables to capture logs and traces from their UDFs to improve application reliability and performance. With Event Tables, you now have a built-in place to easily and consistently manage logging and tracing for your Snowflake applications. And in conjunction with other features such as Snowflake Alerts and Email Notifications, you can be notified of new events and errors in your applications. Try Event Tables today To learn more about Event Tables, join us at BUILD, Snowflake’s developer conference. Or get started with Event Tables today with a tutorial and quickstarts for logging and tracing. For further information about how Event Tables work, visit Snowflake product documentation. The post Collect Logs and Traces From Your Snowflake Applications With Event Tables appeared first on Snowflake. View the full article
  • Forum Statistics

    42.5k
    Total Topics
    42.3k
    Total Posts
×
×
  • Create New...