Jump to content

Data Engineering & Data Science

Data Engineering

  • Data Pipelines (ETL/ELT)

  • Big Data Technologies

  • Cloud Computing for Data

  • Data Governance & Quality

Data Science

  • Machine Learning (ML)

  • Statistical Analysis

  • Data Visualization

  • Natural Language Processing (NLP)

  1. Apache Iceberg and Parquet are popular storage formats in the big data industry. However, they are also often confused terms. So today, we’ll compare these two storage formats, their features, and their unique capabilities. Moreover, they are not competing technologies but complementary ones. They can be used together to maximize the use of both table […]View the full article

    • 0 replies
    • 26 views
  2. Started by Hevo Data,

    In this information age, there has been explosive growth in the rate and type of data generated daily. From mobile devices and IoT sensors to our online content, unprecedented amounts of data are generated. Traditional databases and warehouse technologies cannot handle this data volume and variety burst. This led to the rise of data lakes […]View the full article

    • 0 replies
    • 27 views
  3. Has it ever occurred to you that the volume of data your business processes daily is too overwhelming? You are not alone. So many companies need help in managing and analyzing enterprise data efficiently. Introducing Snowflake Horizon, the game-changing solution that will revolutionize data management and analysis. In this blog post, I will walk you […]View the full article

    • 0 replies
    • 26 views
  4. Hallucinations in large language models (LLMs) occur when models produce responses that do not align with factual reality or the provided context. This... View the full article

  5. Are you looking for a data lake tool that is scalable, cost-efficient, and accessible, can store your business’s historical data, and can help you perform intelligent analytics? No worries. To lift the weight off your shoulders, I have compiled a list of data lake tools. This list will help you understand each tool’s key features […]View the full article

    • 0 replies
    • 25 views
  6. Building a data lake for reporting, analytics, and machine learning needs has become general practice. Data lakes allow us to ingest data from multiple sources in their raw formats in real time. This will enable us to scale any data size and save time in defining its schema and transformations. This blog describes a simpler […]View the full article

    • 0 replies
    • 29 views
  7. Discover how we built a real-time personalization engine View the full article

  8. The choice of data management system determines how quickly and in real-time you can store and access information. Some cloud database architectures, like Snowflake, offer a scalable and flexible environment for processing large datasets. Imagine you have a data model in your Snowflake environment, and you want to create a web app that takes custom […]View the full article

    • 0 replies
    • 25 views
  9. While you can use Snowpipe for straightforward and low-complexity data ingestion into Snowflake, Snowpipe alternatives, like Kafka, Spark, and COPY, provide enhanced capabilities for real-time data processing, scalability, flexibility in data handling, and broader ecosystem integration. If your requirements are beyond basic data loading, you may find Kafka or Spark more suitable for building robust […]View the full article

    • 0 replies
    • 24 views
  10. ‍‍Pull raw data, build auto-updated reports dashboards, and find the real-time information you need. Follow this step-by-step explanation to learn how to automatically retrieve data from your Postgres and import it into Google Sheets with a script you can copy and paste into Google Apps Script. Here are some of the best ways to connect […]View the full article

    • 0 replies
    • 26 views
  11. We are thrilled to welcome the Prodvana team to Databricks. At Databricks, we are building one of the world’s largest multi-cloud platforms to... View the full article

    • 0 replies
    • 30 views
  12. We are proud to announce two new analyst reports recognizing Databricks in the data engineering and data streaming space: IDC MarketScape: Worldwide Analytic... View the full article

    • 0 replies
    • 31 views
  13. Databricks announced the public preview of Mosaic AI Agent Framework & Agent Evaluation alongside our Generative AI Cookbook at the Data + AI... View the full article

    • 0 replies
    • 31 views
  14. Mixture-of-Experts (MoE) has emerged as a promising LLM architecture for efficient training and inference. MoE models like DBRX , which use multiple expert... View the full article

    • 0 replies
    • 21 views
  15. Generative AI (GenAI) can unlock immense value. Organizations are cognizant of the potential but wary of the need to make smart choices about... View the full article

    • 0 replies
    • 25 views
  16. Relational databases, such as MySQL, have traditionally helped enterprises manage and analyze massive volumes of data effectively. However, as scalability, real-time analytics, and seamless data integration become increasingly important, contemporary data systems like Snowflake have become strong substitutes. After experimenting with a few different approaches and learning from my failures, I’m excited to share my […]View the full article

    • 0 replies
    • 15 views
  17. Data generated from various sources can make it challenging to integrate and leverage it to make sound, data-driven decisions efficiently. Oracle data integration, part of the broader Oracle Integration suite, offers a comprehensive set of tools and services for effective data ingestion and processing. The platform offers features for building, deploying, and managing real-time data […]View the full article

    • 0 replies
    • 21 views
  18. Oracle data load is an essential process for organizations wanting to import and manage large volumes of data within Oracle databases. This process helps keep the Oracle cloud applications, Oracle E-Business Suite (EBS), and Oracle Autonomous database up-to-date with the latest data. You can facilitate load with various tools that offer user-friendly interfaces. These tools […]View the full article

    • 0 replies
    • 18 views
  19. Introduction Financial institutions face a demanding environment with complex regulatory examinations and a pressing need for flexible and comprehensive risk management solutions. The... View the full article

  20. Today, we are thrilled to announce the general availability of Databricks Assistant and AI-Generated Comments on all cloud platforms . Our mission at... View the full article

  21. The recent Data + AI Summit 2024 was our biggest ever. Over 16,000 of our top customers, prospects, and partners attended in person... View the full article

  22. We’re excited to introduce a revamped Catalog Explorer to streamline your day to day interactions, now live across your Unity Catalog-enabled workspaces. The... View the full article

  23. Are you looking for a simple method to set up real-time replication for data in your Oracle database? If yes, you are in the right place. Real time replication is a typical requirement while using Oracle as a transactional system so that ETL workloads can run based on this replicated instance without putting pressure on […]View the full article

    • 0 replies
    • 16 views
  24. Effective data management requires accurate data capture, storage, processing, and analysis. Date and time values are critical in organizing and filtering data, providing a foundation for efficient data processing. Oracle’s EXTRACT function helps you obtain specific data/time value components within the Oracle database. The function facilitates precise calculations for data/time values, which can be used […]View the full article

    • 0 replies
    • 20 views
  25. Organizations deal with data collected from multiple sources, which increases the complexity of managing and processing it. Oracle offers a suite of tools that helps you store and manage the data, and Apache Spark enables you to handle large-scale data processing tasks. The Oracle Spark connector enables data transfer between Apache Spark and Oracle. This […]View the full article

    • 0 replies
    • 18 views
  26. This article gives information about Snowflake master data management, which you can use to enhance your business revenue. What is Master Data Management? Master data management (MDM) uses various tools and techniques to organize and structure master data in a standardized format. It combines data management practices such as data ingestion, integration, modeling, or governance […]View the full article

    • 0 replies
    • 16 views
  27. Your organization may store large volumes of data in a single Snowflake data warehouse. However, extracting data specific to individual departments from the data warehouse can be time-consuming, delaying your analytical and business intelligence tasks. To address this issue, consider building a data mart within Snowflake. These separate data marts allow your organization’s departments to […]View the full article

    • 0 replies
    • 19 views
  28. You can use data warehouses or data lakes as a repository for data management and analytics tasks. Both of these solutions have their advantages and disadvantages. A data warehouse is the best if your organization works only with structured data. Data lake is a suitable choice if your work is based entirely on raw or […]View the full article

    • 0 replies
    • 18 views
  29. This article comprehensively explains the Snowflake MAX date operations through various example use cases. What is Snowflake MAX? The window function operates on a group of related rows called windows to return one output row for each input row. The syntaxes of MAX for aggregate and window functions are as follows: Here is an example […]View the full article

    • 0 replies
    • 16 views
  30. As your business grows, so does the complexity of your data ecosystem. In today’s data-driven world, managing and integrating this massive volume of data is critical yet challenging. You need a powerful tool and a solution to streamline your data management. Oracle developed the GoldenGate to address this data management issue. Its real-time capability, high […]View the full article

    • 0 replies
    • 16 views
  31. Storage costs are important for any business that deals with large amounts of data daily. These costs are influenced by factors like a storage device’s speed, capacity, reliability, and security and can significantly impact business performance, efficiency, and profitability. Snowflake is a cloud data warehouse with many features to help you store and manage data […]View the full article

    • 0 replies
    • 20 views
  32. Many organizations often ponder if Snowflake, a modern data analytics platform, is a data warehouse or database. Data warehouse and database are two key components of Snowflake’s architecture. Both play distinct but complementary roles, helping you store, process, and analyze data efficiently. The Snowflake warehouse is designed to provide you with scalable and efficient computing […]View the full article

    • 0 replies
    • 17 views
  33. Thousands of data architects, engineers, and scientists met at Data + AI Summit in San Francisco to hear from industry luminaries like Fei... View the full article

  34. Today, in microservices architecture, a large number of applications are communicating with each other. Thus, application performance monitoring is useful for debugging a single application. However, when an application expands into multiple services, it is important to know the time taken by each service, at what stage the exception occurs, and the system’s overall health. […]View the full article

    • 0 replies
    • 17 views
  35. Manually Tracking Sales-based Leads and collecting data from Customer Interactions, Social Media, Emails, etc. can be a cumbersome task, especially when your customer base is growing at an exponential rate. This can be streamlined by an autonomous tool like Salesforce. Salesforce is a Customer Relationship Management(CRM) software company based out of San Francisco. Salesforce provides […]View the full article

    • 0 replies
    • 18 views
  36. Organizations face a discernible lag in performance with the ever-increasing rise in data. Traditional data warehouses become a financial burden with time despite proper planning as companies also suffer storage limitations. However, Amazon rolled out Redshift, providing a cloud-based data warehouse solution that not only addresses data storage and processing issues but also integrates with […]View the full article

    • 0 replies
    • 16 views
  37. With most companies adopting cloud as their primary choice of storing data, the need for having a powerful and robust cloud data warehouse is on the rise. One of the most popular cloud-based data warehouse that meets all these requirements is Google’s BigQuery data warehouse. It allows users to store potentially TBs of data with […]View the full article

    • 0 replies
    • 21 views
  38. Debezium is the database monitoring platform that continuously captures and streams all real-time modifications updated on the respective database systems like MySQL and PostgreSQL. Usually, developers use CLI tools like the default command prompt terminal to work with Debezium, which is the traditional way of setting up the Debezium workspace. To begin working with Debezium, […]View the full article

    • 0 replies
    • 25 views
  39. Started by Hevo Data,

    Salesforce is a subscription-based customer relationship management software that is offered as a completely managed cloud service. Salesforce revolutionized the CRM space by sparing customers the effort of developing custom software or maintaining installations of third-party software. In this blog post, we will discuss how to create Custom Salesforce Reports. Prerequisites Introduction to Salesforce Salesforce […]View the full article

    • 0 replies
    • 17 views
  40. Today, a combination of Debezium and Kafka is embraced by organizations to record changes in databases and provide information to subscribers (other applications). In this article, you will learn about Kafka Debezium, features of Debezium, and how to perform event sourcing using Debezium and Kafka. Prerequisites What is Kafka? Initially developed by LinkedIn, Kafka is […]View the full article

    • 0 replies
    • 20 views
  41. Radiology is an important component of diagnosing and treating disease through medical imaging procedures such as X-rays, computed tomography (CT), magnetic resonance imaging... View the full article

  42. Enhancing DLT development experience is a core focus because it directly impacts the efficiency and satisfaction of developers building data pipelines with DLT... View the full article

  43. In the insurance sector, customers demand personalized, fast, and efficient service that addresses their needs. Meanwhile, insurance agents must access a large amount... View the full article

  44. We are excited to announce that Gartner has recognized Databricks as a Leader in the 2024 Gartner® Magic Quadrant™ for Data Science and... View the full article

  45. Data analytics helps to derive valuable insights from your raw data. It helps you align your business processes for better outcomes by identifying trends and patterns in the data that would otherwise be lost. As your business accumulates large amounts of data, the challenge lies in implementing an efficient data analytics process that can help […]View the full article

    • 0 replies
    • 40 views
  46. Different systems and databases use various date formats. Converting date data into a consistent format will ensure accuracy across systems. For instance, you are collecting sales data from other regions that use different formats. Combining and analyzing the sales data would be time-consuming and error-prone if the data is not standardized. By converting all the […]View the full article

    • 0 replies
    • 32 views
  47. Azure Data Factory (ADF) is a Microsoft-managed data integration solution that facilitates the creation of cloud-based data workflows. It is a fully managed service that can be used to build data pipelines by orchestrating data movement. Snowflake is a fully managed SaaS (Software-as-a-Service) tool that offers cloud-based data warehouse services. It provides multi-cloud support and […]View the full article

    • 0 replies
    • 34 views
  48. Organizations often struggle with data silos and inconsistencies due to customer data being dispersed across multiple systems. Such scattered data can hinder the ability to make informed, data-driven decisions. Platforms like Salesforce and Snowflake help address these challenges by unifying customer data and robust analytics. A Snowflake Salesforce integration offers real-time access to data for […]View the full article

    • 0 replies
    • 32 views
  49. Schema management is crucial for ensuring data quality and consistency in a database. One prominent feature it enables is version control and change management. Version control helps maintain the history of schema versions, allowing an efficient way to track the changes made to the schema. To achieve this, you can use Schemachange, an open-source change […]View the full article

    • 0 replies
    • 29 views
  50. How do you visualize your Snowflake data? Snowsight, the visual interface of Snowflake, allows two different easy ways to visualize your data within Snowflake- by using charts or dashboards. If you have large data volume and all the data from different sources are centralized to Snowflake, both of these methods will be very useful to […]View the full article

    • 0 replies
    • 28 views