Trending repositories for topic data-pipeline
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
A list of useful resources to learn Data Engineering from scratch
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Practical Data Engineering: A Hands-On Real-Estate Project Guide
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
A curated list of awesome public DBT projects
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Practical Data Engineering: A Hands-On Real-Estate Project Guide
A list of useful resources to learn Data Engineering from scratch
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
The leader in Next-Generation Customer Data Infrastructure
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
A list of useful resources to learn Data Engineering from scratch
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Practical Data Engineering: A Hands-On Real-Estate Project Guide
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Practical Data Engineering: A Hands-On Real-Estate Project Guide
Flexible development framework for building streaming data applications in SQL with Kafka, Flink, Postgres, GraphQL, and more.
A curated list of awesome public DBT projects
A list of useful resources to learn Data Engineering from scratch
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
A list of useful resources to learn Data Engineering from scratch
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Practical Data Engineering: A Hands-On Real-Estate Project Guide
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Privacy and Security focused Segment-alternative, in Golang and React
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore, Minio, Postgres)
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and service...
Ecommerce Realtime Data Pipeline (Data Modeling, Workflow Orchestration, Change Data Capture, Analytical Database and Dashboarding)
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Practical Data Engineering: A Hands-On Real-Estate Project Guide
Code for "Efficient Data Processing in Spark" Course
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management
A curated list of awesome public DBT projects
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Ecommerce Realtime Data Pipeline (Data Modeling, Workflow Orchestration, Change Data Capture, Analytical Database and Dashboarding)
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
End-to-end data platform: A PoC Data Platform project utilizing modern data stack (Spark, Airflow, DBT, Trino, Lightdash, Hive metastore, Minio, Postgres)
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
A list of useful resources to learn Data Engineering from scratch
Practical Data Engineering: A Hands-On Real-Estate Project Guide
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Privacy and Security focused Segment-alternative, in Golang and React
Code for "Efficient Data Processing in Spark" Course
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collect...
Memphis.dev is a highly scalable and effortless data streaming platform
The leader in Next-Generation Customer Data Infrastructure
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Code for "Efficient Data Processing in Spark" Course
A curated list of open source tools used in analytics platforms and data engineering ecosystem
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and service...
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License)
A curated list of awesome public DBT projects
Practical Data Engineering: A Hands-On Real-Estate Project Guide
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.
Flexible development framework for building streaming data applications in SQL with Kafka, Flink, Postgres, GraphQL, and more.
Data Engineering - Metropolitan Transportation Authority (MTA) Subway Data Analysis
📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
SQLpipe makes it easy to move the result of one query from one database to another.