Trending repositories for topic data-integration
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Turns Data and AI algorithms into production-ready web applications in no time.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
The open source high performance ELT framework powered by Apache Arrow
🧙 Build, run, and manage data pipelines for integrating and transforming data.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Turns Data and AI algorithms into production-ready web applications in no time.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
The open source high performance ELT framework powered by Apache Arrow
An orchestration platform for the development, production, and observation of data assets.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Turns Data and AI algorithms into production-ready web applications in no time.
An orchestration platform for the development, production, and observation of data assets.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
The open source high performance ELT framework powered by Apache Arrow
🧙 Build, run, and manage data pipelines for integrating and transforming data.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every d...
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Powerful RDF Knowledge Graph Generation with RML Mappings
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
Turns Data and AI algorithms into production-ready web applications in no time.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
The open source high performance ELT framework powered by Apache Arrow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
🧙 Build, run, and manage data pipelines for integrating and transforming data.
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every d...
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
An orchestration platform for the development, production, and observation of data assets.
Turns Data and AI algorithms into production-ready web applications in no time.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
The open source high performance ELT framework powered by Apache Arrow
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Privacy and Security focused Segment-alternative, in Golang and React
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Demonstrate integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
An orchestration platform for the development, production, and observation of data assets.
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a...
Fast, sensitive and accurate integration of single-cell data with Harmony
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Demonstrate integration of Senzing and Neo4j to construct an Entity Resolved Knowledge Graph
Turns Data and AI algorithms into production-ready web applications in no time.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
The open source high performance ELT framework powered by Apache Arrow
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
Self-contained distributed software platform for building stateful, massively real-time streaming applications in Rust.
Privacy and Security focused Segment-alternative, in Golang and React
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Self-contained distributed software platform for building stateful, massively real-time streaming applications in Rust.
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Turns Data and AI algorithms into production-ready web applications in no time.
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
The Common Core Ontology Repository holds the current released version of the Common Core Ontology suite.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
An orchestration platform for the development, production, and observation of data assets.
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
🎼 Integrate multiple high-dimensional datasets with fuzzy k-means and locally linear adjustments.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a...
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
A tool facilitating matching for any dataset discovery method. Also, an extensible experiment suite for state-of-the-art schema matching methods.
Toolbox for including enzyme constraints on a genome-scale model.