Jump to content

Data Engineering & Data Science

Data Engineering

  • Data Pipelines (ETL/ELT)

  • Big Data Technologies

  • Cloud Computing for Data

  • Data Governance & Quality

Data Science

  • Machine Learning (ML)

  • Statistical Analysis

  • Data Visualization

  • Natural Language Processing (NLP)

  1. In this four-part blog series “Lessons learned building Cybersecurity Lakehouses,” we are discussing a number of challenges organizations face with data engineering when bui... View the full article

  2. This blog is the first of a series of blog posts highlighting industry-leading data providers we collaborate with and Marketplace data providers. Special... View the full article

  3. Apache Spark™ 3.5 and Databricks Runtime 14.0 have brought an exciting feature to the table: Python user-defined table functions (UDTFs). In this blog p... View the full article

  4. Access all of Datacamp's 460+ data and AI courses, career tracks & certifications ... https://www.datacamp.com/freeweek

    • 0 replies
    • 1.7k views
  5. This blog was written in collaboration with Dan Newingham, Solution Delivery Manager, ZS and Aaron Zavora, Technical Director, HLS, Databricks Mandates for electronic... View the full article

  6. In Apache Spark™, Python User-Defined Functions (UDFs) are among the most popular features. They empower users to craft custom code tailored to their u... View the full article

  7. We are excited to announce that we have completed our acquisition of Arcion, a leading provider for real-time data replication technologies. Arcion’s capabilities w... View the full article

  8. In this four-part blog series "Lessons learned from building Cybersecurity Lakehouses," we will discuss a number of challenges organizations face with data engineering... View the full article

  9. In this blog we will demonstrate with examples, how you can seamlessly upgrade your Hive metastore (HMS)* tables to Unity Catalog (UC) using... View the full article

  10. Whether you’re an NFL fanatic, an alumnus rooting for your alma mater or a super fan just trying to catch a glimpse of T... View the full article

  11. We are excited to announce the general availability (GA) of several key security features for Databricks on Google Cloud: Private connectivity with Private... View the full article

  12. Today we're excited to announce MLflow 2.8 supports our LLM-as-a-judge metrics which can help save time and costs while providing an approximation of... View the full article

  13. Last year, we published the Big Book of MLOps, outlining guiding principles, design considerations, and reference architectures for Machine Learning Operations (MLOps). Since then, Databricks has added key features simplifying MLOps, and Generative AI has brought new requirements to MLOps platforms and processes. We are excited to announce a new version of the Big Book of MLOps covering these product updates and Generative AI requirements. This blog post highlights key updates in the eBook, which can be downloaded here ... View the full article

  14. Introduction Four months ago, we shared how AMD had emerged as a capable platform for generative AI and demonstrated how to easily and... View the full article

  15. No-code or low-code functionalities in data science have gained significant traction in recent years. These solutions are well-proven and matured, and they make data science more accessible to a wider range of people.View the full article

    • 0 replies
    • 109 views
  16. Predictive Optimization intelligently optimizes your Lakehouse table data layouts for peak performance and cost-efficiency - without you needing to lift a finger. View the full article

  17. Announcing GA of Predictive I/O for Updates, which harnesses Photon and AI atop Deletion Vectors in order to significantly speed up MERGE, UPDATE and DELETE operations. View the full article

  18. Providence's MLOps Platform Providence is a healthcare organization with 120,000 caregivers serving over 50 hospitals and 1,000 clinics across seven states. Providence is... View the full article

  19. Check out our Nearest Neighborhood Search Solution Accelerator to get started quickly. The Member Experience An insured member typically experiences their healthcare in... View the full article

  20. SAP's recent announcement of a strategic partnership with Databricks has generated significant excitement among SAP customers. Databricks, the data and AI experts, presents... View the full article

  21. Machine learning (ML) is more than just developing models; it's about bringing them to life in real-world, production systems. But transitioning from prototype... View the full article

  22. Pricing plays a crucial role in the success of any (consumer packaged goods) CPG organization. Beyond covering the basic costs of development, manufacturing... View the full article

  23. Introduction Large Language Models (LLMs) have given us a way to generate text, extract information, and identify patterns in industries from healthcare to... View the full article

  24. Customer data is the lifeblood of modern organizations in every industry. As organizations level-up their data teams and practices with the Data Lakehouse... View the full article

  25. We are at the outset of the next industrial revolution, powered by AI. Unlike the past four revolutions that stretch across three centuries... View the full article

  26. Today, we are excited to announce the general availability of the Databricks SQL Statement Execution API on AWS and Azure, with support for... View the full article

  27. This blog was written in collaboration with David Roberts (Analytics Engineering Manager), Kevin P. Buchan Jr (Assistant Vice President, Analytics), and Yubin Park... View the full article

  28. This post explains how you can orchestrate a PySpark application using Amazon EMR Serverless and AWS Step Functions... View the full article

    • 0 replies
    • 125 views
  29. In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs)... View the full article

  30. Understanding the best strategy when dealing with millions of possible combinations How do you take the gameplay of millions of daily users in... View the full article

  31. SQL is the essential data science language due to its universal database accessibility, efficient data cleaning capabilities, seamless integration with other languages, and requirement for most data science jobs.View the full article

  32. This week: What three data science projects should you choose to guarantee you get the job? • A 7 step guide to help you go from the fundamentals of machine learning and Python to Transformers, recent advances in NLP, and beyond. View the full article

  33. The definitive guide for choosing the right method for your use case.View the full article

  34. RNN, Transformers, and BERT are popular NLP techniques with tradeoffs in sequence modeling, parallelization, and pre-training for downstream tasks.View the full article

  35. We are delighted to announce that Databricks Asset Bundles are now in public preview. Bundles, for short, facilitate the adoption of software engineering... View the full article

  36. Looking to understand the semantic layer and how it can improve your data stack? This GigaOm Sonar report on Semantic Layers can help you delve deeper. View the full article

  37. We’re excited to announce that Meta AI’s Llama 2 foundation chat models are available in the Databricks Marketplace for you to fine-tune and dep... View the full article

  38. Retailers have long shared sales and inventory data with their suppliers. Combined access to this information enables the two parties to assess consumer... View the full article

  39. Databricks has obtained the International Standards Organization (ISO) 27701 certification as a data processor https://www.databricks.com/blog/databricks-obtains-iso-27701-certification

    • 0 replies
    • 179 views
  40. We’re excited to announce that Databricks has obtained the International Standards Organization (ISO) 27701 certification as a data processor. This certification reflects our c... View the full article

    • 0 replies
    • 165 views
  41. Written in partnership with Shell. The energy industry is all about physical assets – from terminals, ships and pipelines to refineries and wind f... View the full article

  42. A common challenge data scientists encounter when developing machine learning solutions is training a model on a dataset that is too large to... View the full article

  43. This blog post was written in collaboration with Eric Schwartz, Director of Partnerships at Ribbon Health, and David Kulwin, Director, Databricks Marketplace. Ensuring... View the full article

  44. Today, we’re excited to announce Brickbuilder Accelerators, an expansion to the Brickbuilder Program that pairs the expertise of system integrator and consulting partners w... View the full article

  45. This blog was written in collaboration with Sukh Sekhon, Software Engineer, Cloud Infrastructure and Helen Li, Sr. Director of Engineering at Exai Bio... View the full article

  46. Biomechanical data has emerged as a game-changing factor for Major League Baseball (MLB) teams, offering a competitive edge in enhancing player performance and... View the full article

  47. This article represents a collaborative effort between Plotly, Ballard Power Systems, and Databricks. Fleets of buses worldwide run on hydrogen fuel cells made... View the full article

  48. We are excited to announce the public preview of the next generation of Databricks SQL dashboards, dubbed Lakeview dashboards. Available today, this new... View the full article

  49. In August, Snowflake released new features around Snowpark for Python, DevOps, pipeline replication, and more. Read on to learn more about the full set of features that were just announced. Snowpark Python Updates Snowpark support for Python 3.9 and 3.10 – general availability Snowpark External Access – public preview Tabular Return Values from Python Stored Procedures – general availability Vectorized User-Defined Table Functions – public preview Deploy and Manage Snowflake objects and code with ease – public preview Notifications for better observability – general availability Data pipelines replication – public p…

    • 0 replies
    • 130 views
  50. Now in preview, AWS Glue Elastic Views is a new capability of AWS Glue that makes it easy to build materialized views that combine and replicate data across multiple data stores without you having to write custom code. With AWS Glue Elastic Views, you can use familiar Structured Query Language (SQL) to quickly create a virtual table—a materialized view—from multiple different source data stores. AWS Glue Elastic Views copies data from each source data store and creates a replica in a target data store. AWS Glue Elastic Views continuously monitors for changes to data in your source data stores, and provides updates to the materialized views in your target data stores autom…

    • 1 reply
    • 450 views