Statistics for topic data-pipeline
RepositoryStats tracks 584,796 Github repositories, of these 74 are tagged with the data-pipeline topic. The most common primary language for repositories using this topic is Python (32).
Stargazers over time for topic data-pipeline
Most starred repositories for topic data-pipeline (view more)
Trending repositories for topic data-pipeline (view more)
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
Practical Data Engineering: A Hands-On Real-Estate Project Guide
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Code for "Efficient Data Processing in Spark" Course
Practical Data Engineering: A Hands-On Real-Estate Project Guide
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
A list of useful resources to learn Data Engineering from scratch
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Code for "Efficient Data Processing in Spark" Course
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Ecommerce Realtime Data Pipeline (Data Modeling, Workflow Orchestration, Change Data Capture, Analytical Database and Dashboarding)
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
Code for "Efficient Data Processing in Spark" Course
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and service...
A curated list of open source tools used in analytics platforms and data engineering ecosystem
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...