Jump to content

Data Engineering

  1. Special thanks to Phillip Jones, Senior Product Manager, and Harshal Brahmbhatt, Systems Engineer from Cloudflare for their contributions to this blog. Organizations across... View the full article

  2. In today's environment, proactive cybersecurity is crucial to any public sector agency. For many organizations, log data that security professionals need for effective... View the full article

  3. Today, we are excited to announce that Unity Catalog Volumes is now generally available on AWS, Azure, and GCP. Unity Catalog provides a... View the full article

  4. We are excited to announce the upcoming general availability of Azure Private Link support for Databricks SQL (DBSQL) Serverless, planned in April 2024... View the full article

  5. About UK Power Networks UK Power Networks is the largest electricity distributor in the UK. It maintains electricity cables and lines in London... View the full article

  6. For the past two years, Databricks has collaborated with leading consulting partners to build innovative solutions for industry, migration, and data and AI... View the full article

  7. In the dynamic realm of AI-driven forecasting, businesses navigate a landscape where strategic choices shape their trajectory. One such pivotal decision was made... View the full article

  8. Pretrained large language models aren’t particularly good at responding in concise, coherent sentences out of the box. At a minimum, they have to b... View the full article

  9. What is the US Air Force (USAF) Hackathon? The Air Force Test Center (AFTC) Data Hackathon is a consortium of test experts across... View the full article

  10. In April 2023 we announced the release of Databricks ARC to enable simple, automated linking of data within a single table. Today we... View the full article

  11. This blog was written in collaboration with Anand Iyer, PhD, MBA, Chief Analytics Officer and Abhi Kumbara, Data Science Manager at Welldoc The... View the full article

  12. Started by Databricks,

    As Chief Scientist (Neural Networks) at Databricks, I lead our research team toward the goal of giving everyone the ability to build and... View the full article

  13. In this blog post, we will share how you can use Databricks SQL Materialized Views with Lakeview dashboards to deliver fresh data and... View the full article

  14. There are thousands of datasets available to institutional investors, each dataset promising to unlock significant insights in investment decisioning. Across the thousands of... View the full article

  15. Welcome to the blog series covering product advancements in 2023 for Databricks SQL, the serverless data warehouse from Databricks. This is part 2... View the full article

  16. Quantization is a technique for making machine learning models smaller and faster. We quantize Llama2-70B-Chat, producing an equivalent-quality model that generates 2.2x more... View the full article

  17. At Databricks, we believe that AI will change the way that enterprises interact with their data. That’s why today, we're excited to welcome t... View the full article

  18. Databricks recently announced the Data Intelligence Platform, a natural evolution of the lakehouse architecture we pioneered. The idea of a Data Intelligence Platform... View the full article

  19. Reliable, accurate and trusted data is the most critical requirement for any data application in an enterprise. As Databricks customers increasingly rely on... View the full article

  20. Today, we are announcing the industry's first Generative AI Engineer learning pathway and certification to help ensure that data and AI practitioners have... View the full article

  21. Started by Databricks,

    This post is part of a series. Check out Part 1: The Data + AI Trifecta: People, Process, and Platform In the current... View the full article

  22. This is part 1 of a blog series where we look back at the major areas of progress for Databricks SQL in 2023... View the full article

  23. Since COVID, countless articles have been written about the "Great Resignation", including in-depth analysis by the World Economic Forum. One key thing this... View the full article

  24. Cyber threats and the tools to combat them have become more sophisticated. SIEM is over 20 years old and has evolved significantly in... View the full article

  25. “Short cuts make long delays.” ― J.R.R. Tolkien, The Fellowship of the Ring The lakehouse pattern, in which you store all of your struc... View the full article

  26. Data engineers rely on math and statistics to coax insights out of complex, noisy data. Among the most important domains is calculus, which... View the full article

  27. Ray is an open-source unified compute framework that simplifies scaling AI and Python workloads in a distributed environment. Since we introduced support for... View the full article

  28. The communications industry is undergoing one of the most significant periods of growth (and change) in its 100+ year history. The dramatic increase... View the full article

  29. Back in July, we released the public preview of the new Databricks Assistant, a context-aware AI assistant available in Databricks Notebooks, SQL editor... View the full article

  30. In today's interconnected digital landscape, data sharing and collaboration across organizations and platforms are crucial for modern business operations. Delta Sharing, an innovative... View the full article

  31. At Databricks, we want to help our customers build and deploy generative AI applications on their own data without sacrificing data privacy or... View the full article

  32. Started by Databricks,

    PySpark has always provided wonderful SQL and Python APIs for querying data. As of Databricks Runtime 12.1 and Apache Spark 3.4, parameterized queries... View the full article

  33. Introduction Anomaly detection is widely applied across various industries, playing a significant role in the enterprise sector. This blog focuses on its application... View the full article

  34. Over the past six months, we've been working with NVIDIA to get the most out of their new TensorRT-LLM library. TensorRT-LLM provides an easy-to-use Python interface to integrate with a web server for fast, efficient inference performance with LLMs. In this post, we're highlighting some key areas where our collaboration with NVIDIA has been particularly important. View the full article

  35. We are excited to announce that Gartner has recognized Databricks as a Leader for a third consecutive year in the 2023 Gartner® Magic... View the full article

  36. Today, Databricks is excited to announce support for Mixtral 8x7B in Model Serving. Mixtral 8x7B is a sparse Mixture of Experts (MoE) open... View the full article

  37. Started by Databricks,

    Governance ensures data and AI products are consistently developed and maintained, adhering to precise guidelines and standards. It's the blueprint for architects, bringing... View the full article

  38. We are excited to share new identity and access management features to help simplify the set-up and scale of Databricks for admins. Unity... View the full article

    • 0 replies
    • 1.4k views
  39. Request a meeting with Databricks executives/thought leaders at NRF! Each January, thousands of leaders from retailers around the globe gather at Javits Center... View the full article

  40. An effective campaign can help improve a company's revenue by increasing the sales of its products, clearing out more stock, bringing in more... View the full article

  41. Started by Databricks,

    "AFROTECH was not only insightful, but also greatly heightened my sense of belonging in the tech space! It was amazing to both make... View the full article

    • 0 replies
    • 2.8k views
  42. We’re excited to announce that the latest release of sparklyr on CRAN introduces support for Databricks Connect. R users now have seamless access t... View the full article

  43. Introduction Databricks Lakehouse Monitoring allows you to monitor all your data pipelines – from data to features to ML models – without additional too... View the full article

  44. As businesses grow, data volumes scale from GBs to TBs (or more), and latency demands go from hours to minutes (or less), making... View the full article

  45. Retrieval Augmented Generation (RAG) is an efficient mechanism to provide relevant data as context in Gen AI applications. Most RAG applications typically use... View the full article

  46. Enterprise leaders are turning to the Databricks Data Intelligence Platform to create a centralized source of high-quality data that business teams can leverage... View the full article

  47. Following the announcement we made yesterday around Retrieval Augmented Generation (RAG), today, we’re excited to announce the public preview of Databricks Vector Search. W... View the full article

  48. Retrieval-Augmented-Generation (RAG) has quickly emerged as a powerful way to incorporate proprietary, real-time data into Large Language Model (LLM) applications. Today we are... View the full article

  49. This was written in collaboration with Andrew Mullins, Director of Data Science at Kin + Carta. With the rise of new technologies from... View the full article

  50. We’re excited to announce the launch of Azure Qatar. With the expanded availability of Azure Databricks, it is now easier than ever for o... View the full article

  51. Recent data show that the number of recall campaigns caused by product deficiencies keeps increasing, while each known recorded case is a multi-million... View the full article

  52. To mark the announcement of Databricks listing in Guidewire Marketplace, Marcela Granados, our GTM Director for Insurance, Justin Fenton, Senior Director, Alliances, sat... View the full article

  53. This blog was written in collaboration with Ben Eisenberg, VP of Innovation at People Data Labs, and Tom Ashenmacher, Chief Revenue Officer at... View the full article

  54. We are excited to introduce five new integrations in Databricks Partner Connect—a one-stop portal enabling you to use partner solutions with your Databricks D... View the full article

  55. Background: Modernizing Data Delivery Today's enterprise data estates are vastly different from 10 years ago. Industries have transitioned their analytics from monolithic data... View the full article

  56. Defining what a data culture is can vary by organization. A data culture is the shared values, attitudes, and behaviors that enable organizations... View the full article

  57. Building on the momentum of Databricks Assistant, the context-aware AI assistant integrated within Databricks Notebooks, SQL editor, and file editor, and now powering... View the full article

  58. The costs of fraud are staggering. In 2022, just one type of fraud, card-not-present fraud, resulted in almost $6bn in losses in the... View the full article

  59. We recently announced our AI-generated documentation feature, which uses large language models (LLMs) to automatically generate documentation for tables and columns in Unity... View the full article

  60. A key element in orchestrating multi-stage data and AI processes and pipelines is control flow management. This is why we continue to invest... View the full article

  61. In this four-part blog series "Lessons learned from building Cybersecurity Lakehouses," we are discussing a number of challenges organizations face with data engineering... View the full article

  62. Today we are excited to announce the general availability of Azure Databricks support for Azure confidential computing (ACC)! With support for Azure confidential... View the full article

  63. Started by Databricks,

    The observation that "software is eating the world" has shaped the modern tech industry. Today, software is ubiquitous in our lives, from the... View the full article

    • 0 replies
    • 2.4k views
  64. Managing the environment of an application in a distributed computing environment can be challenging. Ensuring that all nodes have the necessary environment to... View the full article

  65. Started by Databricks,

    Today, we introduce the new availability of named arguments for SQL functions. With this feature, you can invoke functions in more flexible ways... View the full article

  66. In this four-part blog series, "Lessons learned from building Cybersecurity Lakehouses," we are discussing a number of challenges organizations face with data engineering... View the full article

  67. In today's data-driven world, organizations face the challenge of effectively ingesting and processing data at an unprecedented scale. With the amount and variety... View the full article

  68. Insurance companies have seen a tremendous shift in modernization. Traditionally known for the use of legacy systems, leading carriers are modernizing their infrastructure... View the full article

  69. Large language models (LLMs) have set the corporate world ablaze, and everyone wants to take advantage. In fact, 47% of enterprises expect to... View the full article

  70. Databricks Unity Catalog simplifies data and AI governance by providing a unified solution for organizations to securely discover, access, monitor, and collaborate on... View the full article

  71. In this four-part blog series “Lessons learned building Cybersecurity Lakehouses,” we are discussing a number of challenges organizations face with data engineering when bui... View the full article

  72. This blog is the first of a series of blog posts highlighting industry-leading data providers we collaborate with and Marketplace data providers. Special... View the full article

  73. Apache Spark™ 3.5 and Databricks Runtime 14.0 have brought an exciting feature to the table: Python user-defined table functions (UDTFs). In this blog p... View the full article

  74. This blog was written in collaboration with Dan Newingham, Solution Delivery Manager, ZS and Aaron Zavora, Technical Director, HLS, Databricks Mandates for electronic... View the full article

  75. In Apache Spark™, Python User-Defined Functions (UDFs) are among the most popular features. They empower users to craft custom code tailored to their u... View the full article

  76. We are excited to announce that we have completed our acquisition of Arcion, a leading provider for real-time data replication technologies. Arcion’s capabilities w... View the full article

  77. In this four-part blog series "Lessons learned from building Cybersecurity Lakehouses," we will discuss a number of challenges organizations face with data engineering... View the full article

  78. In this blog we will demonstrate with examples, how you can seamlessly upgrade your Hive metastore (HMS)* tables to Unity Catalog (UC) using... View the full article

  79. Whether you’re an NFL fanatic, an alumnus rooting for your alma mater or a super fan just trying to catch a glimpse of T... View the full article

  80. We are excited to announce the general availability (GA) of several key security features for Databricks on Google Cloud: Private connectivity with Private... View the full article

  81. Today we're excited to announce MLflow 2.8 supports our LLM-as-a-judge metrics which can help save time and costs while providing an approximation of... View the full article

  82. Last year, we published the Big Book of MLOps, outlining guiding principles, design considerations, and reference architectures for Machine Learning Operations (MLOps). Since then, Databricks has added key features simplifying MLOps, and Generative AI has brought new requirements to MLOps platforms and processes. We are excited to announce a new version of the Big Book of MLOps covering these product updates and Generative AI requirements. This blog post highlights key updates in the eBook, which can be downloaded here ... View the full article

  83. Introduction Four months ago, we shared how AMD had emerged as a capable platform for generative AI and demonstrated how to easily and... View the full article

  84. Announcing GA of Predictive I/O for Updates, which harnesses Photon and AI atop Deletion Vectors in order to significantly speed up MERGE, UPDATE and DELETE operations. View the full article

  85. Predictive Optimization intelligently optimizes your Lakehouse table data layouts for peak performance and cost-efficiency - without you needing to lift a finger. View the full article

  86. Providence's MLOps Platform Providence is a healthcare organization with 120,000 caregivers serving over 50 hospitals and 1,000 clinics across seven states. Providence is... View the full article

  87. Check out our Nearest Neighborhood Search Solution Accelerator to get started quickly. The Member Experience An insured member typically experiences their healthcare in... View the full article

  88. SAP's recent announcement of a strategic partnership with Databricks has generated significant excitement among SAP customers. Databricks, the data and AI experts, presents... View the full article

  89. Machine learning (ML) is more than just developing models; it's about bringing them to life in real-world, production systems. But transitioning from prototype... View the full article

  90. Pricing plays a crucial role in the success of any (consumer packaged goods) CPG organization. Beyond covering the basic costs of development, manufacturing... View the full article

  91. Introduction Large Language Models (LLMs) have given us a way to generate text, extract information, and identify patterns in industries from healthcare to... View the full article

  92. Customer data is the lifeblood of modern organizations in every industry. As organizations level-up their data teams and practices with the Data Lakehouse... View the full article

  93. We are at the outset of the next industrial revolution, powered by AI. Unlike the past four revolutions that stretch across three centuries... View the full article

  94. Today, we are excited to announce the general availability of the Databricks SQL Statement Execution API on AWS and Azure, with support for... View the full article

  95. This blog was written in collaboration with David Roberts (Analytics Engineering Manager), Kevin P. Buchan Jr (Assistant Vice President, Analytics), and Yubin Park... View the full article

  96. In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs)... View the full article

  97. Understanding the best strategy when dealing with millions of possible combinations How do you take the gameplay of millions of daily users in... View the full article

  98. We are delighted to announce that Databricks Asset Bundles are now in public preview. Bundles, for short, facilitate the adoption of software engineering... View the full article

  99. We’re excited to announce that Meta AI’s Llama 2 foundation chat models are available in the Databricks Marketplace for you to fine-tune and dep... View the full article

  100. Retailers have long shared sales and inventory data with their suppliers. Combined access to this information enables the two parties to assess consumer... View the full article