Statistics for topic data-pipeline
RepositoryStats tracks 609,853 Github repositories, of these 79 are tagged with the data-pipeline topic. The most common primary language for repositories using this topic is Python (34).
Stargazers over time for topic data-pipeline
Most starred repositories for topic data-pipeline (view more)
Trending repositories for topic data-pipeline (view more)
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
A list of useful resources to learn Data Engineering from scratch
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A list of useful resources to learn Data Engineering from scratch
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License)
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
📡 Real-time data pipeline with Kafka, Flink, Iceberg, Trino, MinIO, and Superset. Ideal for learning data systems.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Starting with MongoDB
Ecommerce Realtime Data Pipeline (Data Modeling, Workflow Orchestration, Change Data Capture, Analytical Database and Dashboarding)
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.
Code for "Efficient Data Processing in Spark" Course
A curated list of open source tools used in analytics platforms and data engineering ecosystem
This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and service...
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.