Jump to content

Data Engineering & Data Science

Data Engineering

  • Data Pipelines (ETL/ELT)

  • Big Data Technologies

  • Cloud Computing for Data

  • Data Governance & Quality

Data Science

  • Machine Learning (ML)

  • Statistical Analysis

  • Data Visualization

  • Natural Language Processing (NLP)

  1. The Amazon EventBridge connector for Apache Kafka Connect is now generally available. This open-source connector streamlines event integration of Kafka environments with dozens of AWS services and partner integrations without writing custom integration code or running multiple connectors for each target. The connector includes built-in support for Kafka schema registries, offloading large event payloads to S3, and IAM role-based authentication, and is available under the Apache 2.0 license in the AWS GitHub organization. Amazon EventBridge is a serverless event router that enables you to create highly scalable event-driven applications by routing events between your ow…

  2. Learn how Sharesies overcame significant challenges managing their customer data and how RudderStack became the core of its modern data stack.View the full article

  3. Think of your manufacturing operation like an orchestra - every instrument needs to play in perfect harmony to create a masterpiece. But instead of violinsView the full article

  4. Databricks Assistant is a context-aware AI assistant natively available in the Databricks Data Intelligence Platform. It is designed to simplify SQL and data analysis byView the full article

  5. If the last few weeks have made us certain of something, it’s uncertainty. Supply chains are being completely reimagined to meet the demands of aView the full article

  6. We are excited to introduce the Public Preview of OIDC Token Federation for Enhanced Delta Sharing Security a major security and usability enhancement for whenView the full article

  7. Understanding your customers isn't just about knowing who they are—it's about understanding what they do. Clean, accurate event data is fundamental for this. View the full article

  8. Discover why Apache Iceberg is generating so much excitement, especially for streaming data! This accessible lightboard from Tim Berglund demystifies Iceberg, explaining its history and how it functions within modern data architectures. Learn how Confluent's innovative TableFlow simplifies accessing your Apache Kafka topic data as an Iceberg table in your data lake, eliminating the need for cumbersome integrations. If you're looking to streamline data lake querying and streaming data analysis, this is a must-watch…

  9. As organizations scale their Amazon Web Services (AWS) infrastructure, they frequently encounter challenges in orchestrating data and analytics workloads across multiple AWS accounts and AWS Regions. While multi-account strategy is essential for organizational separation and governance, it creates complexity in maintaining secure data pipelines and managing fine-grained permissions particularly when different teams manage resources in separate accounts. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that you can use to set up and operate data pipelines in the Amazon Cloud at scale. Apache Airflow is an open …

  10. Data is the fuel for AI, and organizations are racing to leverage enterprise data to build AI agents, intelligent search, and AI-powered analytics for productivity, deeper insights, and a competitive edge. To power their data clouds, tens of thousands of organizations already choose BigQuery and its integrated AI capabilities. This decade requires AI-native, multimodal, and agentic data-to-AI platforms, with BigQuery leading the way as the autonomous data-to-AI platform. Finally, we have a platform that infuses AI, makes unstructured data a first class citizen, accelerates open lakehouses and embeds governance... View the full article

  11. For decades, businesses have wrestled with unlocking the true potential of their data for real-time operations. Bigtable, Google Cloud's pioneering NoSQL database, has been the engine behind massive-scale, low-latency applications that operate at a global scale. It was purpose-built for the challenges faced in real-time applications, and remains a key piece of Google infrastructure, including YouTube and Ads. This week at Google Cloud Next, we announced continuous materialized views, an expansion of Bigtable’ SQL capabilities. Bigtable SQL and continuous materialized views enable users to build fully-managed, real-time application backends using familiar SQL syntax, incl…

  12. Today, we are announcing the Data Architect learning pathway, a dedicated learning track that equips data architects with the required resources and skills for success.View the full article

    • 0 replies
    • 9 views
  13. Access to high-quality, real-world data is crucial for developing effective machine learning models. However, when this data contains sensitive information, organizations face a significant hurdleView the full article

    • 0 replies
    • 2 views
  14. Databricks Secures Google Cloud Technology Partner of the Year Award for Data & Analytics - Smart Analytics! We’re excited to announce that Databricks has been View the full article

    • 0 replies
    • 69 views
  15. Summary: LLMs have revolutionized software development by increasing the productivity of programmers. However, despite off-the-shelf LLMs being trained on a significant amount of code, they are notView the full article

    • 0 replies
    • 37 views
  16. In modern data architectures, Apache Iceberg has emerged as a popular table format for data lakes, offering key features including ACID transactions and concurrent write support. Although these capabilities are powerful, implementing them effectively in production environments presents unique challenges that require careful consideration. Consider a common scenario: A streaming pipeline continuously writes data to an Iceberg table while scheduled maintenance jobs perform compaction operations. Although Iceberg provides built-in mechanisms to handle concurrent writes, certain conflict scenarios—such as between streaming updates and compaction operations—can lead to transac…

  17. Learn how to strike the right balance between real-time and warehouse-gated customer data architecture.View the full article

    • 0 replies
    • 34 views
  18. At Databricks, we help our customers solve their problems by leveraging data and AI. To pursue this mission, we are continuing to expand our presenceView the full article

    • 0 replies
    • 46 views
  19. Learn how Zoopla transformed real estate experiences in the UK with data-driven personalization and RudderStack's customer data infrastructure.View the full article

    • 0 replies
    • 29 views
  20. Today, AWS announces the general availability of AWS Glue G.4X and G.8X workers in the US West (N. California), Asia Pacific (Seoul), Asia Pacific (Mumbai), Europe (London), Europe (Spain), and South America (São Paulo) AWS regions. Glue G.4X and G.8X workers enable you to run your most demanding serverless data integration workloads in these additional regions. AWS Glue is a serverless, scalable data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources. AWS Glue G.4X and G.8X workers provide higher compute, memory, and storage resources than current Glue workers. These new types of workers help you scale and r…

  21. The distinctions and intersections between Data Science, Machine Learning, and Artificial Intelligence can be complex and controversial.View the full article

    • 0 replies
    • 13 views
  22. Are you a startup building core, customer-facing B2B products on Databricks? Then we have a Challenge for you! On the heels of our Generative AI...View the full article

    • 0 replies
    • 88 views
  23. With today’s launch, AWS Clean Rooms provides additional privacy-enhancing controls to support aggregation and list analysis rules using the Spark analytics engine. Using AWS Clean Rooms Spark SQL, you and your partners can now manage how your data is used with aggregation, list, and custom analysis rules, running SQL queries with configurable resources based on your performance, scale, and cost requirements. For example, advertisers can use list analysis rules to create targeted audience segments from collective advertiser and publisher data sets without sharing the raw data used to create the segments. Similarly, publishers and their partners can run media planning a…

  24. Since our launch on Google Cloud Platform (GCP) in 2021, Databricks on Google Cloud has provided more than 1,500 joint customers with a tightly integrated...View the full article

    • 0 replies
    • 92 views
  25. We’re excited to announce the General Availability of Lakeflow Connect for Salesforce and Workday. Lakeflow Connect introduces no-code ingestion connectors for popular SaaS applications, databases,...View the full article

    • 0 replies
    • 73 views
  26. Introduction Game developers have always looked to build ongoing relationships with its players to maximize the play they bring to the world, and the success...View the full article

    • 0 replies
    • 100 views
  27. Understanding GraphRAG What is a Knowledge Graph? To understand why one may use a Knowledge Graph (KG) instead of another structured data representation, it’s importantView the full article

    • 0 replies
    • 64 views
  28. Learn about the importance of building a strong data foundation in the Starter Stage of the data maturity journey.View the full article

    • 0 replies
    • 18 views
  29. Discover how the Canadian Football League transformed their fan engagement strategy by unifying ticketing, e-commerce, and fan data with RudderStackView the full article

    • 0 replies
    • 17 views
  30. As more and more organizations embrace analytics, a wider range of problems are being brought forward to be solved. While data science teams are often...View the full article

    • 0 replies
    • 78 views
  31. We’re excited to announce the Public Preview of the Microsoft Power BI task type in Databricks Workflows, available on Azure, AWS, and GCP. With this...View the full article

    • 0 replies
    • 71 views
  32. Within the big data and analytics space there are two names at the forefront of conversation: Apache Spark and Databricks. While they’re closely related, they serve very different purposes in the data ecosystem. Understanding their core differences is critical for architects, developers, and data engineers looking to build scalable, high-performance data solutions in the cloud. […] The article Databricks vs Apache Spark: Key Differences and When to Use Each was originally published on Build5Nines. To stay up-to-date, Subscribe to the Build5Nines Newsletter. View the full article

    • 0 replies
    • 102 views
  33. Willis Towers Watson (WTW) is a multinational company that provides a wide range of services in commercial insurance brokerage, risk management, employee benefits, and actuarial...View the full article

    • 0 replies
    • 75 views
  34. Qwen models, developed by Alibaba, have shown strong performance in both code completion and instruction tasks. In this blog, we’ll show how you can register...View the full article

    • 0 replies
    • 69 views
  35. Databricks enables organizations to securely share data, AI models, and analytics across teams, partners, and platforms without duplication or vendor lock-in. With Delta Sharing, Databricks...View the full article

    • 0 replies
    • 74 views
  36. Databricks introduced last year Databricks Apps, completing its suite of tools that allows users to create and deploy applications directly on the Databricks Platform. With...View the full article

    • 0 replies
    • 69 views
  37. We’re excited to announce that Anthropic Claude 3.7 Sonnet is now natively available in Databricks across AWS, Azure, and GCP. For the first time, you View the full article

    • 0 replies
    • 72 views
  38. Prisma Cloud is the leading Cloud Security platform that provides comprehensive code-to-cloud visibility into your risks and incidents, offering key remediation capabilities to manage andView the full article

    • 0 replies
    • 52 views
  39. Large language models are challenging to adapt to new enterprise tasks. Prompting is error-prone and achieves limited quality gains, while fine-tuning requires large amounts ofView the full article

    • 0 replies
    • 50 views
  40. Databricks Apps provide a robust platform for building and hosting interactive applications. React is great for building modern, dynamic web applications that need to updateView the full article

    • 0 replies
    • 28 views
  41. Driving Sustainable Aluminum Production: How to Calculate the Material Recovery Ratio with GraphFrames Sustainable production has become an imperative in today’s manufacturing market. According toView the full article

    • 0 replies
    • 31 views
  42. The journey to data maturity is about taking the right steps at the right time to unlock value from the data you have. Learn how RudderStack can help.View the full article

    • 0 replies
    • 10 views
  43. We’re excited to announce the General Availability of Explore in Tableau, a new integration that lets you create Tableau Cloud visualizations directly from Unity Catalog...View the full article

    • 0 replies
    • 21 views
  44. We’re making it easier than ever for Databricks customers to run secure, scalable Apache Spark™ workloads on Unity Catalog Compute with Unity Catalog Lakeguard. In...View the full article

    • 0 replies
    • 29 views