Showing results for tags 'data pipelines'.

Found 6 results

Sort By
- Date
- Relevancy

etl ETL vs Data Pipeline : A Comprehensive Guide 101

Hevo Data posted a topic in Databases, Data Engineering & Data Science

Today, businesses all around the world are driven by data. This has led to companies exploiting every available online application, service, and social platform to extract data to better understand the changing market trends. Now, this data requires numerous complex transformations to get ready for Data Analytics. Moreover, companies require technologies that can transfer and […]View the full article
- April 26, 2024
- - data pipelines
  - comparisons
  - (and 2 more)
    Tagged with:
    
    data pipelines
    
    comparisons
    
    differences
    
    101
data pipelines When To Build vs. Buy Data Pipelines

RudderStack posted a topic in Databases, Data Engineering & Data Science

In this post, we highlight the tradeoffs you should consider when deciding to build or buy your data pipelines. View the full article
- April 9, 2024
- - build vs buy
data pipelines A Guide To Data Pipeline Testing with Python

TDS posted a topic in Databases, Data Engineering & Data Science

A gentle introduction to unit testing, mocking and patching for beginners Continue reading on Towards Data Science » View the full article
- March 9, 2024
- - testing
  - python
data pipelines Securing and Monitoring Your Data Pipeline: Best Practices for Kafka, AWS RDS, Lambda, and API Gateway Integration

DZone posted a topic in Databases, Data Engineering & Data Science

There are several steps involved in implementing a data pipeline that integrates Apache Kafka with AWS RDS and uses AWS Lambda and API Gateway to feed data into a web application. Here is a high-level overview of how to architect this solution: 1. Set Up Apache Kafka Apache Kafka is a distributed streaming platform that is capable of handling trillions of events a day. To set up Kafka, you can either install it on an EC2 instance or use Amazon Managed Streaming for Kafka (Amazon MSK), which is a fully managed service that makes it easy to build and run applications that use Apache Kafka to process streaming data. View the full article
- February 29, 2024
- - security
  - pipelines
  - (and 5 more)
    Tagged with:
    
    security
    
    pipelines
    
    kafka
    
    aws rds
    
    lambda
    
    api gateway
    
    best practices
data pipelines Streaming Data Pipeline Architecture

DZone posted a topic in DevOps & SRE General Discussion

Streaming data pipelines have become an essential component in modern data-driven organizations. These pipelines enable real-time data ingestion, processing, transformation, and analysis. In this article, we will delve into the architecture and essential details of building a streaming data pipeline. Data Ingestion Data ingestion is the first stage of streaming a data pipeline. It involves capturing data from various sources such as Kafka, MQTT, log files, or APIs. Common techniques for data ingestion include: View the full article
- January 16, 2024
- - data streaming
  - streaming
  - (and 1 more)
    Tagged with:
    
    data streaming
    
    streaming
    
    architecture
data pipelines Offline Data Pipeline Best Practices Part 2:Optimizing Airflow Job Parameters for Apache Hive

DZone posted a topic in CI/CD, GitOps, Orchestration & Scheduling

This post series is about mastering offline data pipeline's best practices, focusing on the potent combination of Apache Airflow and data processing engines like Hive and Spark. In Part 1 of our series explored the strategies for enhancing Airflow data pipelines using Apache Hive on AWS EMR. Our primary objective was to attain cost efficiency and establish effective job configurations. In this concluding Part 2, we will extensively explore Apache Spark, another pivotal element in our comprehensive data engineering toolkit. By optimizing the Airflow job parameters specifically for Spark, there is a substantial potential for enhancing performance and realizing substantial cost savings. Why Apache Spark in Airflow? Apache Spark is a really important framework and tool for data processing in companies all about data. It's genuinely outstanding at processing massive amounts of data quickly and efficiently. It's especially great for complex data analytics with fast query performance and advanced analytics capabilities. This makes Spark a preferred choice for enterprises handling vast amounts of data and requiring real-time analytics. View the full article
- December 29, 2023
- - best practices
  - scheduling
  - (and 2 more)
    Tagged with:
    
    best practices
    
    scheduling
    
    apache hive
    
    apache airflow

Sign In

Search the Community

Search By Tags

Search By Author

Content Type

Forums

Calendars

Find results in...

Find results that contain...

Date Created

Start

End

Last Updated

Start

End

Filter by number of...

Minimum number of comments

Minimum number of replies

Minimum number of reviews

Minimum number of views

Joined

Start

End

Group

Website URL

LinkedIn Profile URL

About Me

Cloud Platforms

Cloud Experience

Development Experience

Current Role

Skills

Certifications

Favourite Tools

Interests

etl ETL vs Data Pipeline : A Comprehensive Guide 101

data pipelines When To Build vs. Buy Data Pipelines

data pipelines A Guide To Data Pipeline Testing with Python

data pipelines Securing and Monitoring Your Data Pipeline: Best Practices for Kafka, AWS RDS, Lambda, and API Gateway Integration

data pipelines Streaming Data Pipeline Architecture

data pipelines Offline Data Pipeline Best Practices Part 2:Optimizing Airflow Job Parameters for Apache Hive

Forum Statistics