Jump to content

Data Engineering & Data Science

Data Engineering

  • Data Pipelines (ETL/ELT)

  • Big Data Technologies

  • Cloud Computing for Data

  • Data Governance & Quality

Data Science

  • Machine Learning (ML)

  • Statistical Analysis

  • Data Visualization

  • Natural Language Processing (NLP)

  1. Cyber threats and the tools to combat them have become more sophisticated. SIEM is over 20 years old and has evolved significantly in... View the full article

  2. “Short cuts make long delays.” ― J.R.R. Tolkien, The Fellowship of the Ring The lakehouse pattern, in which you store all of your struc... View the full article

  3. Data engineers rely on math and statistics to coax insights out of complex, noisy data. Among the most important domains is calculus, which... View the full article

  4. Ray is an open-source unified compute framework that simplifies scaling AI and Python workloads in a distributed environment. Since we introduced support for... View the full article

  5. The communications industry is undergoing one of the most significant periods of growth (and change) in its 100+ year history. The dramatic increase... View the full article

  6. Back in July, we released the public preview of the new Databricks Assistant, a context-aware AI assistant available in Databricks Notebooks, SQL editor... View the full article

  7. In today's interconnected digital landscape, data sharing and collaboration across organizations and platforms are crucial for modern business operations. Delta Sharing, an innovative... View the full article

  8. At Databricks, we want to help our customers build and deploy generative AI applications on their own data without sacrificing data privacy or... View the full article

  9. Started by Databricks,

    PySpark has always provided wonderful SQL and Python APIs for querying data. As of Databricks Runtime 12.1 and Apache Spark 3.4, parameterized queries... View the full article

  10. Introduction Anomaly detection is widely applied across various industries, playing a significant role in the enterprise sector. This blog focuses on its application... View the full article

  11. Over the past six months, we've been working with NVIDIA to get the most out of their new TensorRT-LLM library. TensorRT-LLM provides an easy-to-use Python interface to integrate with a web server for fast, efficient inference performance with LLMs. In this post, we're highlighting some key areas where our collaboration with NVIDIA has been particularly important. View the full article

  12. We are excited to announce that Gartner has recognized Databricks as a Leader for a third consecutive year in the 2023 Gartner® Magic... View the full article

  13. Today, Databricks is excited to announce support for Mixtral 8x7B in Model Serving. Mixtral 8x7B is a sparse Mixture of Experts (MoE) open... View the full article

  14. Started by Databricks,

    Governance ensures data and AI products are consistently developed and maintained, adhering to precise guidelines and standards. It's the blueprint for architects, bringing... View the full article

  15. We are excited to share new identity and access management features to help simplify the set-up and scale of Databricks for admins. Unity... View the full article

    • 0 replies
    • 1.4k views
  16. Request a meeting with Databricks executives/thought leaders at NRF! Each January, thousands of leaders from retailers around the globe gather at Javits Center... View the full article

  17. An effective campaign can help improve a company's revenue by increasing the sales of its products, clearing out more stock, bringing in more... View the full article

  18. Started by Databricks,

    "AFROTECH was not only insightful, but also greatly heightened my sense of belonging in the tech space! It was amazing to both make... View the full article

    • 0 replies
    • 2.8k views
  19. We’re excited to announce that the latest release of sparklyr on CRAN introduces support for Databricks Connect. R users now have seamless access t... View the full article

  20. Introduction Databricks Lakehouse Monitoring allows you to monitor all your data pipelines – from data to features to ML models – without additional too... View the full article

  21. As businesses grow, data volumes scale from GBs to TBs (or more), and latency demands go from hours to minutes (or less), making... View the full article

  22. Retrieval Augmented Generation (RAG) is an efficient mechanism to provide relevant data as context in Gen AI applications. Most RAG applications typically use... View the full article

  23. Enterprise leaders are turning to the Databricks Data Intelligence Platform to create a centralized source of high-quality data that business teams can leverage... View the full article

  24. Following the announcement we made yesterday around Retrieval Augmented Generation (RAG), today, we’re excited to announce the public preview of Databricks Vector Search. W... View the full article

  25. Retrieval-Augmented-Generation (RAG) has quickly emerged as a powerful way to incorporate proprietary, real-time data into Large Language Model (LLM) applications. Today we are... View the full article

  26. This was written in collaboration with Andrew Mullins, Director of Data Science at Kin + Carta. With the rise of new technologies from... View the full article

  27. We’re excited to announce the launch of Azure Qatar. With the expanded availability of Azure Databricks, it is now easier than ever for o... View the full article

  28. Recent data show that the number of recall campaigns caused by product deficiencies keeps increasing, while each known recorded case is a multi-million... View the full article

  29. To mark the announcement of Databricks listing in Guidewire Marketplace, Marcela Granados, our GTM Director for Insurance, Justin Fenton, Senior Director, Alliances, sat... View the full article

  30. This blog was written in collaboration with Ben Eisenberg, VP of Innovation at People Data Labs, and Tom Ashenmacher, Chief Revenue Officer at... View the full article

  31. We are excited to introduce five new integrations in Databricks Partner Connect—a one-stop portal enabling you to use partner solutions with your Databricks D... View the full article

  32. Background: Modernizing Data Delivery Today's enterprise data estates are vastly different from 10 years ago. Industries have transitioned their analytics from monolithic data... View the full article

  33. Defining what a data culture is can vary by organization. A data culture is the shared values, attitudes, and behaviors that enable organizations... View the full article

  34. Want to support the behavior of built-in functions and method calls in your Python classes? Magic methods in Python let you do just that! So let’s uncover the method behind the magic.View the full article

  35. Building on the momentum of Databricks Assistant, the context-aware AI assistant integrated within Databricks Notebooks, SQL editor, and file editor, and now powering... View the full article

  36. The costs of fraud are staggering. In 2022, just one type of fraud, card-not-present fraud, resulted in almost $6bn in losses in the... View the full article

  37. Started by KDnuggets,

    This article is about the four key soft skills every data scientist needs, and how to work on them.View the full article

  38. We recently announced our AI-generated documentation feature, which uses large language models (LLMs) to automatically generate documentation for tables and columns in Unity... View the full article

  39. Started by TDS,

    Let’s talk about data engineers’ nightmare Continue reading on Towards Data Science » View the full article

    • 0 replies
    • 9.6k views
  40. A key element in orchestrating multi-stage data and AI processes and pipelines is control flow management. This is why we continue to invest... View the full article

  41. In this four-part blog series "Lessons learned from building Cybersecurity Lakehouses," we are discussing a number of challenges organizations face with data engineering... View the full article

  42. Today we are excited to announce the general availability of Azure Databricks support for Azure confidential computing (ACC)! With support for Azure confidential... View the full article

  43. Started by Databricks,

    The observation that "software is eating the world" has shaped the modern tech industry. Today, software is ubiquitous in our lives, from the... View the full article

    • 0 replies
    • 2.4k views
  44. Managing the environment of an application in a distributed computing environment can be challenging. Ensuring that all nodes have the necessary environment to... View the full article

  45. Started by Databricks,

    Today, we introduce the new availability of named arguments for SQL functions. With this feature, you can invoke functions in more flexible ways... View the full article

  46. Started by TDS,

    Readers Digest to Learn Data Engineering Gradually Continue reading on Towards Data Science » View the full article

    • 0 replies
    • 8.7k views
  47. In this four-part blog series, "Lessons learned from building Cybersecurity Lakehouses," we are discussing a number of challenges organizations face with data engineering... View the full article

  48. In today's data-driven world, organizations face the challenge of effectively ingesting and processing data at an unprecedented scale. With the amount and variety... View the full article

  49. Insurance companies have seen a tremendous shift in modernization. Traditionally known for the use of legacy systems, leading carriers are modernizing their infrastructure... View the full article

  50. Large language models (LLMs) have set the corporate world ablaze, and everyone wants to take advantage. In fact, 47% of enterprises expect to... View the full article