Trending repositories for topic data-pipelines
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turni...
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The best place to learn data engineering. Built and maintained by the data engineering community.
An orchestration platform for the development, production, and observation of data assets.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turni...
The best place to learn data engineering. Built and maintained by the data engineering community.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
An orchestration platform for the development, production, and observation of data assets.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turni...
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
An orchestration platform for the development, production, and observation of data assets.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
The best place to learn data engineering. Built and maintained by the data engineering community.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turni...
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
The best place to learn data engineering. Built and maintained by the data engineering community.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
An orchestration platform for the development, production, and observation of data assets.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service wit...
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turni...
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
An orchestration platform for the development, production, and observation of data assets.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
The best place to learn data engineering. Built and maintained by the data engineering community.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Dataform is a framework for managing SQL based data operations in BigQuery
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turni...
Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.
A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Main repo including core data model, data marts, data quality tests, and terminology sets.
Conductor OSS SDK for Python programming language
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service wit...
The best place to learn data engineering. Built and maintained by the data engineering community.
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
A high-performance, extremely flexible, and easily extensible universal workflow engine.
An orchestration platform for the development, production, and observation of data assets.
Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turni...
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turni...
An orchestration platform for the development, production, and observation of data assets.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
The best place to learn data engineering. Built and maintained by the data engineering community.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
A high-performance, extremely flexible, and easily extensible universal workflow engine.
Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.
A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services
A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Main repo including core data model, data marts, data quality tests, and terminology sets.
The best place to learn data engineering. Built and maintained by the data engineering community.
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services