Trending repositories for topic data-pipelines

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

24,424 (+376)

StructuredLabs/preswald

Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turni...

3,051 (+31)

apache-2.0

Unstructured-IO/unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

10,873 (+23)

apache-2.0

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,604 (+22)

apache-2.0

data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

1,644 (+12)

cc0-1.0

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,938 (+11)

apache-2.0

feldera/feldera

The Feldera Incremental Computation Engine

1,257 (+4)

elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

2,045 (+4)

apache-2.0

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,246 (+4)

apache-2.0

apache/dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

13,406 (+3)

apache-2.0

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

1,053 (+2)

apache-2.0

CogStack/CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

49 (+1)

dataflint/spark

Performance Observability for Apache Spark

247 (+1)

apache-2.0

Last 3 days (relative gain)

CogStack/CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

49 (+2%)

pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

24,424 (+2%)

StructuredLabs/preswald

3,051 (+1%)

apache-2.0

data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

1,644 (+0.7%)

cc0-1.0

dataflint/spark

Performance Observability for Apache Spark

247 (+0.4%)

apache-2.0

feldera/feldera

The Feldera Incremental Computation Engine

1,257 (+0.3%)

Unstructured-IO/unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

10,873 (+0.2%)

apache-2.0

elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

2,045 (+0.2%)

apache-2.0

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

1,053 (+0.2%)

apache-2.0

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,938 (+0.1%)

apache-2.0

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,604 (+0.1%)

apache-2.0

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,246 (+0.0%)

apache-2.0

apache/dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

13,406 (+0.0%)

apache-2.0

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

24,424 (+795)

StructuredLabs/preswald

3,051 (+215)

apache-2.0

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,604 (+77)

apache-2.0

Unstructured-IO/unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

10,873 (+66)

apache-2.0

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,938 (+47)

apache-2.0

apache/dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

13,406 (+18)

apache-2.0

feldera/feldera

The Feldera Incremental Computation Engine

1,257 (+17)

data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

1,644 (+13)

cc0-1.0

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+12)

apache-2.0

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

1,053 (+9)

apache-2.0

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,246 (+9)

apache-2.0

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

1,747 (+8)

mit

meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

2,025 (+7)

mit

elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

2,045 (+7)

apache-2.0

fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

1,227 (+7)

mit

opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

1,308 (+4)

apache-2.0

dataflint/spark

Performance Observability for Apache Spark

247 (+3)

apache-2.0

artie-labs/transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.

645 (+2)

amphi-ai/amphi-etl

Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.

1,039 (+2)

CogStack/CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

49 (+1)

Last week (relative gain)

StructuredLabs/preswald

3,051 (+8%)

apache-2.0

pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

24,424 (+3%)

CogStack/CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

49 (+2%)

feldera/feldera

The Feldera Incremental Computation Engine

1,257 (+1%)

dataflint/spark

Performance Observability for Apache Spark

247 (+1%)

apache-2.0

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

1,053 (+0.9%)

apache-2.0

data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

1,644 (+0.8%)

cc0-1.0

Unstructured-IO/unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

10,873 (+0.6%)

apache-2.0

fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

1,227 (+0.6%)

mit

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

1,747 (+0.5%)

mit

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,938 (+0.4%)

apache-2.0

meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

2,025 (+0.3%)

mit

elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

2,045 (+0.3%)

apache-2.0

artie-labs/transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.

645 (+0.3%)

opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

1,308 (+0.3%)

apache-2.0

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+0.3%)

apache-2.0

elementary-data/dbt-data-reliability

dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service wit...

429 (+0.2%)

apache-2.0

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,604 (+0.2%)

apache-2.0

amphi-ai/amphi-etl

Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.

1,039 (+0.2%)

apache/dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

13,406 (+0.1%)

apache-2.0

Last month (new repositories)

no newly created repositories trending in the last month

Last month (absolute gain)

StructuredLabs/preswald

3,051 (+1,069)

apache-2.0

pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

24,424 (+724)

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,604 (+422)

apache-2.0

Unstructured-IO/unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

10,873 (+348)

apache-2.0

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,938 (+213)

apache-2.0

apache/dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

13,406 (+86)

apache-2.0

feldera/feldera

The Feldera Incremental Computation Engine

1,257 (+73)

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+60)

apache-2.0

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,246 (+51)

apache-2.0

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

1,053 (+49)

apache-2.0

fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

1,227 (+44)

mit

data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

1,644 (+40)

cc0-1.0

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

1,747 (+36)

mit

meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

2,025 (+30)

mit

amphi-ai/amphi-etl

Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.

1,039 (+25)

elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

2,045 (+25)

apache-2.0

dataform-co/dataform

Dataform is a framework for managing SQL based data operations in BigQuery

896 (+14)

apache-2.0

pyper-dev/pyper

Concurrent Python made simple

1,187 (+14)

mit

dataflint/spark

Performance Observability for Apache Spark

247 (+13)

apache-2.0

opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

1,308 (+13)

apache-2.0

Last month (relative gain)

StructuredLabs/preswald

3,051 (+54%)

apache-2.0

confluentinc/learn-kafka-courses

Learn the basics of Apache Kafka® from leaders in the Kafka community with these video courses covering the Kafka ecosystem and hands-on exercises.

30 (+11%)

mitdbg/palimpzest

A System for (Optimized) Semantic Computation

97 (+7%)

mit

iesahin/xvc

A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

51 (+6%)

gpl-3.0

feldera/feldera

The Feldera Incremental Computation Engine

1,257 (+6%)

dataflint/spark

Performance Observability for Apache Spark

247 (+6%)

apache-2.0

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

1,053 (+5%)

apache-2.0

tuva-health/tuva

Main repo including core data model, data marts, data quality tests, and terminology sets.

247 (+5%)

conductor-sdk/conductor-python

Conductor OSS SDK for Python programming language

72 (+4%)

apache-2.0

CogStack/CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

49 (+4%)

fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

1,227 (+4%)

mit

Unstructured-IO/unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

10,873 (+3%)

apache-2.0

DidactHQ/didact-engine

The REST API and execution engine for the Didact Platform.

72 (+3%)

elementary-data/dbt-data-reliability

429 (+3%)

apache-2.0

data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

1,644 (+2%)

cc0-1.0

amphi-ai/amphi-etl

Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.

1,039 (+2%)

DidactHQ/didact

The open core .NET job orchestrator that we've been missing

94 (+2%)

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

1,747 (+2%)

mit

siyul-park/uniflow

A high-performance, extremely flexible, and easily extensible universal workflow engine.

51 (+2%)

mit

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,938 (+2%)

apache-2.0

Last 12-months (new repositories)

StructuredLabs/preswald

3,051

apache-2.0

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

1,747

mit

pyper-dev/pyper

Concurrent Python made simple

1,187

mit

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

1,053

apache-2.0

runodp/dagster-odp

A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code

apache-2.0

montara-io/dbt-command-center

Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.

mit

Last 12-months (absolute gain)

pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

24,424 (+22,731)

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,604 (+5,251)

apache-2.0

Unstructured-IO/unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

10,873 (+5,070)

apache-2.0

StructuredLabs/preswald

3,051 (+3,048)

apache-2.0

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,938 (+2,813)

apache-2.0

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+1,804)

apache-2.0

ucbepic/docetl

A system for agentic LLM-powered data processing and ETL

1,747 (+1,746)

mit

apache/dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

13,406 (+1,437)

apache-2.0

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,246 (+1,296)

apache-2.0

pyper-dev/pyper

Concurrent Python made simple

1,187 (+1,186)

mit

yobix-ai/extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

1,053 (+1,052)

apache-2.0

amphi-ai/amphi-etl

Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.

1,039 (+1,029)

feldera/feldera

The Feldera Incremental Computation Engine

1,257 (+1,013)

fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

1,227 (+887)

mit

bruin-data/bruin

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

916 (+869)

apache-2.0

data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

1,644 (+623)

cc0-1.0

meltano/meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

2,025 (+448)

mit

elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

2,045 (+319)

apache-2.0

opendatadiscovery/odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

1,308 (+201)

apache-2.0

dataflint/spark

Performance Observability for Apache Spark

247 (+124)

apache-2.0

Last 12-months (relative gain)

amphi-ai/amphi-etl

Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.

1,039 (+10,290%)

bruin-data/bruin

Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.

916 (+1,849%)

apache-2.0

pathwaycom/pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

24,424 (+1,343%)

mitdbg/palimpzest

A System for (Optimized) Semantic Computation

97 (+1,286%)

mit

siyul-park/uniflow

A high-performance, extremely flexible, and easily extensible universal workflow engine.

51 (+1,175%)

mit

montara-io/dbt-command-center

Never sift through endless dbt™ logs again. dbt Command Center is a free, open-source, local web application that provides a user-friendly interface to monitor and manage dbt runs.

28 (+600%)

mit

feldera/feldera

The Feldera Incremental Computation Engine

1,257 (+415%)

runodp/dagster-odp

A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code

30 (+275%)

apache-2.0

fmind/mlops-python-package

Kickstart your MLOps initiative with a flexible, robust, and productive Python package.

1,227 (+261%)

mit

DidactHQ/didact

The open core .NET job orchestrator that we've been missing

94 (+213%)

kestra-io/examples

Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services

25 (+150%)

mit

iesahin/xvc

A robust (🐢) and fast (🐇) MLOps tool for managing data and pipelines in Rust (🦀)

51 (+132%)

gpl-3.0

dataflint/spark

Performance Observability for Apache Spark

247 (+101%)

apache-2.0

Unstructured-IO/unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

10,873 (+87%)

apache-2.0

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+69%)

apache-2.0

tuva-health/tuva

Main repo including core data model, data marts, data quality tests, and terminology sets.

247 (+66%)

mycelial/mycelial

Move your data with ease.

107 (+62%)

apache-2.0

data-engineering-community/data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

1,644 (+61%)

cc0-1.0

DidactHQ/didact-engine

The REST API and execution engine for the Didact Platform.

72 (+60%)

CogStack/CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

49 (+48%)