Jump to content

Data Engineering & Data Science

Data Engineering

  • Data Pipelines (ETL/ELT)

  • Big Data Technologies

  • Cloud Computing for Data

  • Data Governance & Quality

Data Science

  • Machine Learning (ML)

  • Statistical Analysis

  • Data Visualization

  • Natural Language Processing (NLP)

  1. Started by TDS,

    Let’s talk about data engineers’ nightmare Continue reading on Towards Data Science » View the full article

    • 0 replies
    • 9.6k views
  2. A key element in orchestrating multi-stage data and AI processes and pipelines is control flow management. This is why we continue to invest... View the full article

  3. In this four-part blog series "Lessons learned from building Cybersecurity Lakehouses," we are discussing a number of challenges organizations face with data engineering... View the full article

  4. Today we are excited to announce the general availability of Azure Databricks support for Azure confidential computing (ACC)! With support for Azure confidential... View the full article

  5. Started by Databricks,

    The observation that "software is eating the world" has shaped the modern tech industry. Today, software is ubiquitous in our lives, from the... View the full article

    • 0 replies
    • 2.4k views
  6. Managing the environment of an application in a distributed computing environment can be challenging. Ensuring that all nodes have the necessary environment to... View the full article

  7. Started by Databricks,

    Today, we introduce the new availability of named arguments for SQL functions. With this feature, you can invoke functions in more flexible ways... View the full article

  8. Started by TDS,

    Readers Digest to Learn Data Engineering Gradually Continue reading on Towards Data Science » View the full article

    • 0 replies
    • 8.7k views
  9. In this four-part blog series, "Lessons learned from building Cybersecurity Lakehouses," we are discussing a number of challenges organizations face with data engineering... View the full article

  10. In today's data-driven world, organizations face the challenge of effectively ingesting and processing data at an unprecedented scale. With the amount and variety... View the full article

  11. Insurance companies have seen a tremendous shift in modernization. Traditionally known for the use of legacy systems, leading carriers are modernizing their infrastructure... View the full article

  12. Large language models (LLMs) have set the corporate world ablaze, and everyone wants to take advantage. In fact, 47% of enterprises expect to... View the full article

  13. Databricks Unity Catalog simplifies data and AI governance by providing a unified solution for organizations to securely discover, access, monitor, and collaborate on... View the full article

  14. In this four-part blog series “Lessons learned building Cybersecurity Lakehouses,” we are discussing a number of challenges organizations face with data engineering when bui... View the full article

  15. This blog is the first of a series of blog posts highlighting industry-leading data providers we collaborate with and Marketplace data providers. Special... View the full article

  16. Apache Spark™ 3.5 and Databricks Runtime 14.0 have brought an exciting feature to the table: Python user-defined table functions (UDTFs). In this blog p... View the full article

  17. Access all of Datacamp's 460+ data and AI courses, career tracks & certifications ... https://www.datacamp.com/freeweek

    • 0 replies
    • 1.7k views
  18. This blog was written in collaboration with Dan Newingham, Solution Delivery Manager, ZS and Aaron Zavora, Technical Director, HLS, Databricks Mandates for electronic... View the full article

  19. In Apache Spark™, Python User-Defined Functions (UDFs) are among the most popular features. They empower users to craft custom code tailored to their u... View the full article

  20. We are excited to announce that we have completed our acquisition of Arcion, a leading provider for real-time data replication technologies. Arcion’s capabilities w... View the full article

  21. In this four-part blog series "Lessons learned from building Cybersecurity Lakehouses," we will discuss a number of challenges organizations face with data engineering... View the full article

  22. In this blog we will demonstrate with examples, how you can seamlessly upgrade your Hive metastore (HMS)* tables to Unity Catalog (UC) using... View the full article

  23. Whether you’re an NFL fanatic, an alumnus rooting for your alma mater or a super fan just trying to catch a glimpse of T... View the full article

  24. We are excited to announce the general availability (GA) of several key security features for Databricks on Google Cloud: Private connectivity with Private... View the full article

  25. Today we're excited to announce MLflow 2.8 supports our LLM-as-a-judge metrics which can help save time and costs while providing an approximation of... View the full article

  26. Last year, we published the Big Book of MLOps, outlining guiding principles, design considerations, and reference architectures for Machine Learning Operations (MLOps). Since then, Databricks has added key features simplifying MLOps, and Generative AI has brought new requirements to MLOps platforms and processes. We are excited to announce a new version of the Big Book of MLOps covering these product updates and Generative AI requirements. This blog post highlights key updates in the eBook, which can be downloaded here ... View the full article

  27. Introduction Four months ago, we shared how AMD had emerged as a capable platform for generative AI and demonstrated how to easily and... View the full article

  28. No-code or low-code functionalities in data science have gained significant traction in recent years. These solutions are well-proven and matured, and they make data science more accessible to a wider range of people.View the full article

    • 0 replies
    • 102 views
  29. Announcing GA of Predictive I/O for Updates, which harnesses Photon and AI atop Deletion Vectors in order to significantly speed up MERGE, UPDATE and DELETE operations. View the full article

  30. Predictive Optimization intelligently optimizes your Lakehouse table data layouts for peak performance and cost-efficiency - without you needing to lift a finger. View the full article

  31. Providence's MLOps Platform Providence is a healthcare organization with 120,000 caregivers serving over 50 hospitals and 1,000 clinics across seven states. Providence is... View the full article

  32. Check out our Nearest Neighborhood Search Solution Accelerator to get started quickly. The Member Experience An insured member typically experiences their healthcare in... View the full article

  33. SAP's recent announcement of a strategic partnership with Databricks has generated significant excitement among SAP customers. Databricks, the data and AI experts, presents... View the full article

  34. Machine learning (ML) is more than just developing models; it's about bringing them to life in real-world, production systems. But transitioning from prototype... View the full article

  35. Pricing plays a crucial role in the success of any (consumer packaged goods) CPG organization. Beyond covering the basic costs of development, manufacturing... View the full article

  36. Introduction Large Language Models (LLMs) have given us a way to generate text, extract information, and identify patterns in industries from healthcare to... View the full article

  37. Customer data is the lifeblood of modern organizations in every industry. As organizations level-up their data teams and practices with the Data Lakehouse... View the full article

  38. We are at the outset of the next industrial revolution, powered by AI. Unlike the past four revolutions that stretch across three centuries... View the full article

  39. Today, we are excited to announce the general availability of the Databricks SQL Statement Execution API on AWS and Azure, with support for... View the full article

  40. This blog was written in collaboration with David Roberts (Analytics Engineering Manager), Kevin P. Buchan Jr (Assistant Vice President, Analytics), and Yubin Park... View the full article

  41. This post explains how you can orchestrate a PySpark application using Amazon EMR Serverless and AWS Step Functions... View the full article

    • 0 replies
    • 112 views
  42. In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs)... View the full article

  43. Understanding the best strategy when dealing with millions of possible combinations How do you take the gameplay of millions of daily users in... View the full article

  44. SQL is the essential data science language due to its universal database accessibility, efficient data cleaning capabilities, seamless integration with other languages, and requirement for most data science jobs.View the full article

  45. This week: What three data science projects should you choose to guarantee you get the job? • A 7 step guide to help you go from the fundamentals of machine learning and Python to Transformers, recent advances in NLP, and beyond. View the full article

  46. The definitive guide for choosing the right method for your use case.View the full article

  47. RNN, Transformers, and BERT are popular NLP techniques with tradeoffs in sequence modeling, parallelization, and pre-training for downstream tasks.View the full article

  48. We are delighted to announce that Databricks Asset Bundles are now in public preview. Bundles, for short, facilitate the adoption of software engineering... View the full article

  49. Looking to understand the semantic layer and how it can improve your data stack? This GigaOm Sonar report on Semantic Layers can help you delve deeper. View the full article

  50. We’re excited to announce that Meta AI’s Llama 2 foundation chat models are available in the Databricks Marketplace for you to fine-tune and dep... View the full article