Trending repositories for topic data-integration

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,569 (+25)

apache-2.0

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

17,842 (+22)

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,925 (+21)

apache-2.0

Avaiga/taipy

Turns Data and AI algorithms into production-ready web applications in no time.

17,962 (+10)

apache-2.0

apache/seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

8,424 (+8)

apache-2.0

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+7)

apache-2.0

apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

5,730 (+5)

apache-2.0

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,242 (+4)

apache-2.0

seandavi/awesome-single-cell

Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

3,348 (+3)

mit

jitsucom/jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

4,271 (+3)

mit

CommonCoreOntology/CommonCoreOntologies

The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.

228 (+1)

bsd-3-clause

hetio/hetionet

Hetionet: an integrative network of disease

281 (+1)

ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

460 (+1)

apache-2.0

kuwala-io/kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as ...

794 (+1)

apache-2.0

apache/hop

Hop Orchestration Platform

1,113 (+1)

apache-2.0

apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...

2,702 (+1)

apache-2.0

bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

2,935 (+1)

mit

apache/flink-cdc

Flink CDC is a streaming data integration tool

6,022 (+1)

apache-2.0

cloudquery/cloudquery

The developer first cloud governance platform

6,065 (+1)

mpl-2.0

Last 3 days (relative gain)

CommonCoreOntology/CommonCoreOntologies

The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.

228 (+0.4%)

bsd-3-clause

hetio/hetionet

Hetionet: an integrative network of disease

281 (+0.4%)

ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

460 (+0.2%)

apache-2.0

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,925 (+0.2%)

apache-2.0

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+0.2%)

apache-2.0

kuwala-io/kuwala

794 (+0.1%)

apache-2.0

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

17,842 (+0.1%)

apache/seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

8,424 (+0.1%)

apache-2.0

apache/hop

Hop Orchestration Platform

1,113 (+0.1%)

apache-2.0

seandavi/awesome-single-cell

Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

3,348 (+0.1%)

mit

apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

5,730 (+0.1%)

apache-2.0

jitsucom/jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

4,271 (+0.1%)

mit

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,569 (+0.1%)

apache-2.0

Avaiga/taipy

Turns Data and AI algorithms into production-ready web applications in no time.

17,962 (+0.1%)

apache-2.0

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,242 (+0.0%)

apache-2.0

apache/incubator-devlake

2,702 (+0.0%)

apache-2.0

bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

2,935 (+0.0%)

mit

apache/flink-cdc

Flink CDC is a streaming data integration tool

6,022 (+0.0%)

apache-2.0

cloudquery/cloudquery

The developer first cloud governance platform

6,065 (+0.0%)

mpl-2.0

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,569 (+94)

apache-2.0

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

17,842 (+69)

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,925 (+52)

apache-2.0

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+20)

apache-2.0

apache/seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

8,424 (+18)

apache-2.0

Avaiga/taipy

Turns Data and AI algorithms into production-ready web applications in no time.

17,962 (+14)

apache-2.0

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,242 (+7)

apache-2.0

jitsucom/jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

4,271 (+7)

mit

apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

5,730 (+7)

apache-2.0

cloudquery/cloudquery

The developer first cloud governance platform

6,065 (+6)

mpl-2.0

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytics platforms and data engineering ecosystem

298 (+5)

apache/seatunnel-web

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

662 (+5)

apache-2.0

apache/flink-cdc

Flink CDC is a streaming data integration tool

6,022 (+5)

apache-2.0

seandavi/awesome-single-cell

Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

3,348 (+5)

mit

apache/incubator-devlake

2,702 (+4)

apache-2.0

bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

2,935 (+3)

mit

DTStack/chunjun

A data integration framework

4,044 (+3)

apache-2.0

rudderlabs/rudder-server

Privacy and Security focused Segment-alternative, in Golang and React

4,170 (+3)

ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

460 (+2)

apache-2.0

apache/hop

Hop Orchestration Platform

1,113 (+2)

apache-2.0

Last week (relative gain)

DerwenAI/ERKG

Demonstrate integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph

31 (+3%)

mit

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytics platforms and data engineering ecosystem

298 (+2%)

apache/seatunnel-web

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

662 (+0.8%)

apache-2.0

opensanctions/nomenklatura

Framework and command-line tools for integrating FollowTheMoney data streams from multiple sources

206 (+0.5%)

mit

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+0.5%)

apache-2.0

CommonCoreOntology/CommonCoreOntologies

The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.

228 (+0.4%)

bsd-3-clause

ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

460 (+0.4%)

apache-2.0

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,925 (+0.4%)

apache-2.0

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

17,842 (+0.4%)

hetio/hetionet

Hetionet: an integrative network of disease

281 (+0.4%)

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,569 (+0.2%)

apache-2.0

apache/seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

8,424 (+0.2%)

apache-2.0

apache/hop

Hop Orchestration Platform

1,113 (+0.2%)

apache-2.0

immunogenomics/harmony

Fast, sensitive and accurate integration of single-cell data with Harmony

562 (+0.2%)

jitsucom/jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

4,271 (+0.2%)

mit

seandavi/awesome-single-cell

Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

3,348 (+0.1%)

mit

apache/incubator-devlake

2,702 (+0.1%)

apache-2.0

kuwala-io/kuwala

794 (+0.1%)

apache-2.0

bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

2,935 (+0.1%)

mit

cloudquery/cloudquery

The developer first cloud governance platform

6,065 (+0.1%)

mpl-2.0

Last month (new repositories)

no newly created repositories trending in the last month

Last month (absolute gain)

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,569 (+426)

apache-2.0

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

17,842 (+347)

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,925 (+216)

apache-2.0

Avaiga/taipy

Turns Data and AI algorithms into production-ready web applications in no time.

17,962 (+84)

apache-2.0

apache/seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

8,424 (+82)

apache-2.0

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+67)

apache-2.0

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,242 (+54)

apache-2.0

apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

5,730 (+43)

apache-2.0

cloudquery/cloudquery

The developer first cloud governance platform

6,065 (+35)

mpl-2.0

bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

2,935 (+34)

mit

apache/flink-cdc

Flink CDC is a streaming data integration tool

6,022 (+33)

apache-2.0

jitsucom/jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

4,271 (+32)

mit

seandavi/awesome-single-cell

Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

3,348 (+27)

mit

apache/incubator-devlake

2,702 (+26)

apache-2.0

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytics platforms and data engineering ecosystem

298 (+22)

ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

460 (+20)

apache-2.0

apache/seatunnel-web

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

662 (+20)

apache-2.0

rudderlabs/rudder-server

Privacy and Security focused Segment-alternative, in Golang and React

4,170 (+20)

apache/hop

Hop Orchestration Platform

1,113 (+17)

apache-2.0

DTStack/chunjun

A data integration framework

4,044 (+14)

apache-2.0

Last month (relative gain)

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytics platforms and data engineering ecosystem

298 (+8%)

DerwenAI/ERKG

Demonstrate integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph

31 (+7%)

mit

starlake-ai/starlake

Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.

87 (+5%)

apache-2.0

ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

460 (+5%)

apache-2.0

CommonCoreOntology/CommonCoreOntologies

The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.

228 (+4%)

bsd-3-clause

apache/seatunnel-web

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

662 (+3%)

apache-2.0

linkml/linkml-model

Link Modeling Language (LinkML) model

47 (+2%)

saeyslab/nichenetr

NicheNet: predict active ligand-target links between interacting cells

530 (+2%)

siyul-park/uniflow

A high-performance, extremely flexible, and easily extensible universal workflow engine.

51 (+2%)

mit

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

17,842 (+2%)

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,925 (+2%)

apache-2.0

munchy-bytes/SchemaMapper

A .NET class library that allows you to import data from different sources into a unified destination

60 (+2%)

mit

apache/hop

Hop Orchestration Platform

1,113 (+2%)

apache-2.0

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+2%)

apache-2.0

morph-kgc/morph-kgc

Powerful RDF Knowledge Graph Generation with RML Mappings

206 (+1%)

apache-2.0

bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

2,935 (+1%)

mit

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,569 (+1%)

apache-2.0

apache/seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

8,424 (+1.0%)

apache-2.0

apache/incubator-devlake

2,702 (+1.0%)

apache-2.0

artie-labs/transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.

643 (+0.9%)

Last 12-months (new repositories)

runodp/dagster-odp

A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code

apache-2.0

Last 12-months (absolute gain)

Avaiga/taipy

Turns Data and AI algorithms into production-ready web applications in no time.

17,962 (+9,740)

apache-2.0

apache/airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

39,569 (+5,236)

apache-2.0

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

17,842 (+4,024)

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,925 (+2,811)

apache-2.0

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+1,811)

apache-2.0

mage-ai/mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

8,242 (+1,303)

apache-2.0

apache/seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

8,424 (+1,237)

apache-2.0

apache/flink-cdc

Flink CDC is a streaming data integration tool

6,022 (+815)

apache-2.0

apache/hudi

Upserts, Deletes And Incremental Processing on Big Data.

5,730 (+691)

apache-2.0

bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

2,935 (+658)

mit

cloudquery/cloudquery

The developer first cloud governance platform

6,065 (+499)

mpl-2.0

seandavi/awesome-single-cell

Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.

3,348 (+454)

mit

jitsucom/jitsu

Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

4,271 (+447)

mit

swimos/swim-rust

Self-contained distributed software platform for building stateful, massively real-time streaming applications in Rust.

338 (+322)

apache-2.0

apache/incubator-devlake

2,702 (+283)

apache-2.0

apache/hop

Hop Orchestration Platform

1,113 (+265)

apache-2.0

rudderlabs/rudder-server

Privacy and Security focused Segment-alternative, in Golang and React

4,170 (+258)

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytics platforms and data engineering ecosystem

298 (+252)

apache/seatunnel-web

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

662 (+246)

apache-2.0

DTStack/chunjun

A data integration framework

4,044 (+158)

apache-2.0

Last 12-months (relative gain)

swimos/swim-rust

Self-contained distributed software platform for building stateful, massively real-time streaming applications in Rust.

338 (+2,013%)

apache-2.0

siyul-park/uniflow

A high-performance, extremely flexible, and easily extensible universal workflow engine.

51 (+1,175%)

mit

pracdata/awesome-open-source-data-engineering

A curated list of open source tools used in analytics platforms and data engineering ecosystem

298 (+548%)

runodp/dagster-odp

A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code

30 (+275%)

apache-2.0

starlake-ai/starlake

Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.

87 (+172%)

apache-2.0

Avaiga/taipy

Turns Data and AI algorithms into production-ready web applications in no time.

17,962 (+118%)

apache-2.0

infinyon/fluvio

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

4,422 (+69%)

apache-2.0

linkml/linkml-model

Link Modeling Language (LinkML) model

47 (+68%)

apache/seatunnel-web

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

662 (+59%)

apache-2.0

CommonCoreOntology/CommonCoreOntologies

The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.

228 (+58%)

bsd-3-clause

CogStack/CogStack-NiFi

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

47 (+42%)

Teichlab/cellhint

A tool for semi-automatic cell type harmonization and integration

102 (+40%)

mit

morph-kgc/morph-kgc

Powerful RDF Knowledge Graph Generation with RML Mappings

206 (+38%)

apache-2.0

ConduitIO/conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

460 (+36%)

apache-2.0

apache/hop

Hop Orchestration Platform

1,113 (+31%)

apache-2.0

airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

17,842 (+29%)

bruin-data/ingestr

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

2,935 (+29%)

mit

dagster-io/dagster

An orchestration platform for the development, production, and observation of data assets.

12,925 (+28%)

apache-2.0

slowkow/harmonypy

🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.

217 (+26%)

gpl-3.0

dataplane-app/dataplane

Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a...

226 (+23%)