Jump to content

Data Engineering & Data Science

Data Engineering

  • Data Pipelines (ETL/ELT)

  • Big Data Technologies

  • Cloud Computing for Data

  • Data Governance & Quality

Data Science

  • Machine Learning (ML)

  • Statistical Analysis

  • Data Visualization

  • Natural Language Processing (NLP)

  1. Two platforms are most commonly associated with automating your data processes: Fivetran vs Supermetrics. Thus, whether you have the demands of a fast-paced marketing team that needs the functionality of integrating many sources into one destination or an adept technical team that has to maintain high-traffic ETL processes, deciding between Fivetran vs Supermetrics may be […]View the full article

  2. Nowadays, when it comes to data management, every business has to make one critical decision: whether to use a Data Mesh or a Data Warehouse. Both are strong data management architectures, but they are designed to support different needs and various organizational structures. Selecting the right one can make or break how efficiently you manage […]View the full article

  3. In Snowflake, the views are crucial for organizing, selecting, and retrieving data while not copying the data itself. Instead, if performance is a concern—such as in querying large data sets—then Snowflake materialized views are perfect. In this blog, we’ll explore: Ready? Let’s start by talking about views in general. What are Views? In simple terms, […]View the full article

  4. We are excited to introduce several powerful new capabilities to Mosaic AI Gateway, designed to help our customers accelerate their AI initiatives with... View the full article

  5. Imagine giving your business an intelligent bot to talk to customers. Chatbots are commonly used to talk to customers and provide them with... View the full article

  6. Personalization and scale have historically been mutually exclusive. For all the talk of one-to-one marketing and hyper-personalization , the reality has been that... View the full article

  7. As recently announced at this year’s Data and AI Summit, Databricks AI/BI democratizes business intelligence and analytics across your organization with highly visual... View the full article

  8. Started by Hevo Data,

    Did you know that Netflix is one of the biggest clients for AWS? They did not just push a button when they shifted their entire data infrastructure. It took them seven years to complete the entire migration and ensure that every piece of data moved securely and perfectly into the new system. This shows us […]View the full article

  9. Building an efficient data stack that can handle big data is no small feat, whether due to growing data demands or operational costs. A modern data stack solves these problems by automating and streamlining many data tasks, from sourcing to transformation. In this article, we will detail what a modern data stack is and its […]View the full article

  10. Started by Databricks,

    Over the past three months, I had the opportunity to work as a Product Management Intern on the Ingestion team at Databricks. During... View the full article

  11. Started by Databricks,

    Segmentation projects are the cornerstone of personalization in games. Personalization of the player experience helps maximize player engagement, mitigate churn and increase player... View the full article

  12. Maintaining heavy equipment assets, such as oil rigs, agricultural combines, or fleets of vehicles, poses an extremely complex challenge for global companies. These... View the full article

  13. An improved answer-correctness judge in Agent Evaluation Agent Evaluation enables Databricks customers to define, measure, and understand how to improve the quality of... View the full article

  14. Nowadays, businesses heavily rely on data to make informed decisions. Choosing the right tool and data management platform can make or break the business. From small startups to large enterprises, handling, storing, and processing one’s data is crucial for all. Two popular platforms available in the market for these purposes are Snowflake and Informatica. Both […]View the full article

  15. ETL tools are very important to a business dealing with varied data sources. An efficient ETL tool provides the platform to migrate data from multiple sources to a single destination and run analytics on it. With the market flooded with ETL tools, choosing which tool is right for your organization can be challenging. This blog […]View the full article

  16. We recently announced the General Availability of our serverless compute offerings for Notebooks, Jobs, and Pipelines. Serverless compute provides rapid workload startup, automatic... View the full article

  17. A data warehouse is a centralized system that stores, integrates, and analyzes large volumes of structured data from various sources. It is predicted that more than 200 zettabytes of data will be stored in the global cloud by 2025. This exponentially growing data becomes a challenge for traditional data warehouses, as they frequently have issues […]View the full article

  18. Recommender systems (RecSys) have become an integral part of modern digital experiences, powering personalized content suggestions across various platforms. These sophisticated systems and... View the full article

  19. Data teams spend way too much time troubleshooting issues, applying patches, and restarting failed workloads. It's not uncommon for engineers to spend their... View the full article

  20. The B2B customer data platform eliminates data silos, allowing companies to embrace personalization and improve efficiency. Learn more here. View the full article

  21. CDPs (customer data platforms) and CRM (customer relationship management) systems have many important differences. Learn how to choose the right one.View the full article

  22. Data pipelines ingest, transform, and deliver data from disparate sources to downstream destinations. Discover everything you need to know here.View the full article

  23. Understanding CDP vs DMP: Is a customer data platform or a data management platform the best software for your business' data management future?View the full article

  24. At Databricks, we aim to make it simple for enterprises to harness data to speed up business processes and enhance decision-making. AI/BI is... View the full article

  25. Building a world that will continue to be enjoyed by future generations requires a shift in the way we operate. At the forefront... View the full article

  26. For over 40 years, Thomas’ central ethos has been that companies can elevate job satisfaction and productivity by better understanding how people interact... View the full article

  27. Terms like “data governance,” “Generative AI” and “large language models” are becoming commonplace in the workplace. But for business leaders, it takes more... View the full article

  28. Within the Databricks Community, there is a technical blog where community members share best practices, tutorials and insights on data analytics, data engineering... View the full article

  29. Managing and orchestrating data workflows efficiently is crucial in today’s data-driven world. As the amount of data constantly increases with each passing day, so does the complexity of the pipelines handling such data processes. Data orchestration deals specifically with the management and coordination around data pipelines to guarantee the free flow of data from one […]View the full article

  30. Started by Databricks,

    The Databricks Marketplace continues to expand and now includes more than 230 data providers and over 2,200 listings. We recently added over forty... View the full article

  31. To operate with the speed, efficiency and productivity that companies are seeking, more employees need accurate, quick and tailored answers to questions about... View the full article

  32. Skechers has been at the forefront of the e-commerce industry, focusing on hyperpersonalized experiences to meet customer expectations better. Following significant growth during... View the full article

  33. In the rapidly evolving landscape of data management, data warehousing continues to be a cornerstone for businesses seeking to harness the power of... View the full article

  34. As a global media conglomerate housing over 37 distinct brands, Condé Nast faced the challenge of delivering targeted consumer experiences across their brands... View the full article

  35. At the time of writing this blogpost, I'm a mere one week away from the end of my summer internship on the Exploratory... View the full article

  36. Started by Databricks,

    Within the Databricks Community, there is a technical blog where community members share best practices, tutorials and insights on data analytics, data engineering... View the full article

  37. Started by Databricks,

    We're thrilled to launch our 2024 Data + AI World Tour , a series of free in-person events in cities worldwide. Each stop... View the full article

  38. When it comes to GenAI in the enterprise, excitement is colliding with reality. Leaders recognize the technology's power and eagerly want to unleash... View the full article

  39. A recent MIT Tech Review Report shows that 71% of surveyed organizations intend to build their own GenAI models. As more work to... View the full article

  40. Every business wants to be a data and AI vanguard. But to make that happen, companies must commit to a GenAI vision and... View the full article

  41. A guide for developers to create custom DSLs to address specific domain requirements, enhancing productivity and unlocking new possibilities in problem-solving.View the full article

  42. One of the most exciting parts of the Data + AI Summit is hearing about all the ways our over 10,000 global customers... View the full article

  43. Understanding CDP vs DMP: Is a customer data platform or a data management platform the best software for your business' data management future?View the full article

  44. Snowflake is a cloud data warehouse that has taken the world by storm, establishing itself as one of the core technologies in the cloud era. Snowflake is a cross-cloud platform; you can run it on AWS, Azure, or GCP. Central to its power is the Snowflake Semantic Layer, which helps organizations transform raw data into […]View the full article

  45. Started by Databricks,

    Introduction Twelve Labs Embed API enables users to use natural language to explore the content of video libraries, as well as generate summaries... View the full article

  46. We're excited to announce that looping for Tasks in Databricks Workflows with For Each is now Generally Available! This new task type makes... View the full article

  47. We recently announced the general availability of serverless compute for Notebooks, Workflows, and Delta Live Tables (DLT) pipelines. Today, we'd like to explain... View the full article

  48. We're excited to announce the general availability of hybrid search in Mosaic AI Vector Search. Hybrid search is a powerful feature that combines... View the full article

  49. All the code is available in this GitHub repository . Prior to reading this blog we recommend reading Getting Started with Delta Live... View the full article

  50. With growing data and business needs, having an efficient data integration tool to migrate and manage your data has become crucial. Almost every organization keeps its data in different locations, from the internal database to the SaaS platform. To get an overview of the operations or state of finances, organizations drag the data from all […]View the full article