Trending repositories for topic data-integration
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
Turns Data and AI algorithms into production-ready web applications in no time.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
An orchestration platform for the development, production, and observation of data assets.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Turns Data and AI algorithms into production-ready web applications in no time.
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
Turns Data and AI algorithms into production-ready web applications in no time.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Fast, sensitive and accurate integration of single-cell data with Harmony
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
An orchestration platform for the development, production, and observation of data assets.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Fast, sensitive and accurate integration of single-cell data with Harmony
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
NicheNet: predict active ligand-target links between interacting cells
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Turns Data and AI algorithms into production-ready web applications in no time.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Privacy and Security focused Segment-alternative, in Golang and React
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
A .NET class library that allows you to import data from different sources into a unified destination
Self-contained distributed software platform for building stateful, massively real-time streaming applications in Rust.
Fast, sensitive and accurate integration of single-cell data with Harmony
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
Work with your web service, database, and streaming schemas in a single format.
A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code
Turns Data and AI algorithms into production-ready web applications in no time.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
Self-contained distributed software platform for building stateful, massively real-time streaming applications in Rust.
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
Privacy and Security focused Segment-alternative, in Golang and React
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Self-contained distributed software platform for building stateful, massively real-time streaming applications in Rust.
A high-performance, extremely flexible, and easily extensible universal workflow engine.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
A configuration-driven framework for building Dagster pipelines that enables teams to create and manage data workflows using YAML/JSON instead of code
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Turns Data and AI algorithms into production-ready web applications in no time.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
Building data processing pipelines for documents processing with NLP using Apache NiFi and related services
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
A tool facilitating matching for any dataset discovery method. Also, an extensible experiment suite for state-of-the-art schema matching methods.