Jump to content

Search the Community

Showing results for tags 'data pipelines'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

There are no results to display.

There are no results to display.


Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


Website URL


LinkedIn Profile URL


About Me


Cloud Platforms


Cloud Experience


Development Experience


Current Role


Skills


Certifications


Favourite Tools


Interests

Found 6 results

  1. Today, businesses all around the world are driven by data. This has led to companies exploiting every available online application, service, and social platform to extract data to better understand the changing market trends. Now, this data requires numerous complex transformations to get ready for Data Analytics. Moreover, companies require technologies that can transfer and […]View the full article
  2. In this post, we highlight the tradeoffs you should consider when deciding to build or buy your data pipelines. View the full article
  3. A gentle introduction to unit testing, mocking and patching for beginners Continue reading on Towards Data Science » View the full article
  4. There are several steps involved in implementing a data pipeline that integrates Apache Kafka with AWS RDS and uses AWS Lambda and API Gateway to feed data into a web application. Here is a high-level overview of how to architect this solution: 1. Set Up Apache Kafka Apache Kafka is a distributed streaming platform that is capable of handling trillions of events a day. To set up Kafka, you can either install it on an EC2 instance or use Amazon Managed Streaming for Kafka (Amazon MSK), which is a fully managed service that makes it easy to build and run applications that use Apache Kafka to process streaming data. View the full article
  5. Streaming data pipelines have become an essential component in modern data-driven organizations. These pipelines enable real-time data ingestion, processing, transformation, and analysis. In this article, we will delve into the architecture and essential details of building a streaming data pipeline. Data Ingestion Data ingestion is the first stage of streaming a data pipeline. It involves capturing data from various sources such as Kafka, MQTT, log files, or APIs. Common techniques for data ingestion include: View the full article
  6. This post series is about mastering offline data pipeline's best practices, focusing on the potent combination of Apache Airflow and data processing engines like Hive and Spark. In Part 1 of our series explored the strategies for enhancing Airflow data pipelines using Apache Hive on AWS EMR. Our primary objective was to attain cost efficiency and establish effective job configurations. In this concluding Part 2, we will extensively explore Apache Spark, another pivotal element in our comprehensive data engineering toolkit. By optimizing the Airflow job parameters specifically for Spark, there is a substantial potential for enhancing performance and realizing substantial cost savings. Why Apache Spark in Airflow? Apache Spark is a really important framework and tool for data processing in companies all about data. It's genuinely outstanding at processing massive amounts of data quickly and efficiently. It's especially great for complex data analytics with fast query performance and advanced analytics capabilities. This makes Spark a preferred choice for enterprises handling vast amounts of data and requiring real-time analytics. View the full article
  • Forum Statistics

    67.4k
    Total Topics
    65.3k
    Total Posts
×
×
  • Create New...