Trending repositories for topic data-pipeline
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A list of useful resources to learn Data Engineering from scratch
Practical Data Engineering: A Hands-On Real-Estate Project Guide
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Privacy and Security focused Segment-alternative, in Golang and React
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
Code for "Efficient Data Processing in Spark" Course
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Code for "Efficient Data Processing in Spark" Course
Practical Data Engineering: A Hands-On Real-Estate Project Guide
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A list of useful resources to learn Data Engineering from scratch
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
Privacy and Security focused Segment-alternative, in Golang and React
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
A list of useful resources to learn Data Engineering from scratch
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Privacy and Security focused Segment-alternative, in Golang and React
Practical Data Engineering: A Hands-On Real-Estate Project Guide
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
Memphis.dev is a highly scalable and effortless data streaming platform
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Code for "Efficient Data Processing in Spark" Course
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Practical Data Engineering: A Hands-On Real-Estate Project Guide
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Code for "Efficient Data Processing in Spark" Course
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
A list of useful resources to learn Data Engineering from scratch
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Privacy and Security focused Segment-alternative, in Golang and React
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
A list of useful resources to learn Data Engineering from scratch
A curated list of open source tools used in analytics platforms and data engineering ecosystem
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Practical Data Engineering: A Hands-On Real-Estate Project Guide
Privacy and Security focused Segment-alternative, in Golang and React
Code for "Efficient Data Processing in Spark" Course
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
Code for "Efficient Data Processing in Spark" Course
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and service...
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Practical Data Engineering: A Hands-On Real-Estate Project Guide
A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Code for "Efficient Data Processing in Spark" Course
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Ecommerce Realtime Data Pipeline (Data Modeling, Workflow Orchestration, Change Data Capture, Analytical Database and Dashboarding)
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
A list of useful resources to learn Data Engineering from scratch
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Practical Data Engineering: A Hands-On Real-Estate Project Guide
Privacy and Security focused Segment-alternative, in Golang and React
Memphis.dev is a highly scalable and effortless data streaming platform
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collect...
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Code for "Efficient Data Processing in Spark" Course
The leader in Next-Generation Customer Data Infrastructure
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
Code for "Efficient Data Processing in Spark" Course
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and service...
A curated list of open source tools used in analytics platforms and data engineering ecosystem
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
A curated list of awesome public DBT projects
A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.
Flexible development framework for building streaming data applications in SQL with Kafka, Flink, Postgres, GraphQL, and more.
Practical Data Engineering: A Hands-On Real-Estate Project Guide
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
SQLpipe makes it easy to move the result of one query from one database to another.
Ordered-concurrently a library for concurrent processing with ordered output in Go. Process work concurrently and returns output in a channel in the order of input. It is useful in concurrently proces...
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes