Trending repositories for topic data-pipelines
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
An orchestration platform for the development, production, and observation of data assets.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
An orchestration platform for the development, production, and observation of data assets.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
An orchestration platform for the development, production, and observation of data assets.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
The best place to learn data engineering. Built and maintained by the data engineering community.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
The best place to learn data engineering. Built and maintained by the data engineering community.
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
An orchestration platform for the development, production, and observation of data assets.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
An orchestration platform for the development, production, and observation of data assets.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
The best place to learn data engineering. Built and maintained by the data engineering community.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
An orchestration platform for the development, production, and observation of data assets.
A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)
The best place to learn data engineering. Built and maintained by the data engineering community.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
An orchestration platform for the development, production, and observation of data assets.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
The best place to learn data engineering. Built and maintained by the data engineering community.
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)
The REST API and execution engine for the Didact Platform.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Conductor OSS SDK for Python programming language
The best place to learn data engineering. Built and maintained by the data engineering community.
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service wit...
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
An orchestration platform for the development, production, and observation of data assets.
Smart Automation Tool for building modern Data Lakes and Data Pipelines