53 results found Sort:

13.7k
34.9k
apache-2.0
754
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Created 2015-04-13
24,622 commits to main branch, last one 13 hours ago
3.7k
14.5k
other
178
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Created 2020-07-27
16,095 commits to master branch, last one 11 hours ago
1.3k
10.5k
apache-2.0
116
An orchestration platform for the development, production, and observation of data assets.
Created 2018-04-30
18,602 commits to master branch, last one 12 hours ago
668
9.3k
apache-2.0
60
Turns Data and AI algorithms into production-ready web applications in no time.
Created 2022-02-18
6,277 commits to develop branch, last one 22 hours ago
1.6k
7.5k
apache-2.0
173
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Created 2017-08-05
3,989 commits to dev branch, last one 23 hours ago
660
7.2k
apache-2.0
62
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Created 2022-05-16
5,293 commits to master branch, last one 24 hours ago
392
6.8k
apache-2.0
60
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Created 2019-08-24
3,188 commits to develop branch, last one 12 hours ago
501
5.6k
mpl-2.0
58
The open source high performance ELT framework powered by Apache Arrow
Created 2020-11-18
17,712 commits to main branch, last one 18 hours ago
1.8k
5.4k
apache-2.0
143
Flink CDC is a streaming data integration tool
Created 2020-07-27
974 commits to master branch, last one 2 days ago
2.4k
5.1k
apache-2.0
1.2k
Upserts, Deletes And Incremental Processing on Big Data.
Created 2016-12-14
5,446 commits to master branch, last one 12 hours ago
1.7k
3.9k
apache-2.0
170
A data integration framework
Created 2018-04-03
5,263 commits to master branch, last one 2 days ago
270
3.9k
mit
41
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
Created 2020-08-04
820 commits to newjitsu branch, last one 19 hours ago
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
Created 2016-06-29
743 commits to master branch, last one 2 months ago
197
2.7k
apache-2.0
35
Lean and mean distributed stream processing system written in rust and web assembly.
Created 2019-08-31
2,254 commits to master branch, last one 13 hours ago
480
2.5k
apache-2.0
49
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
Created 2021-07-08
5,308 commits to main branch, last one a day ago
47
2.4k
mit
13
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Created 2024-02-12
109 commits to main branch, last one 6 days ago
102
2.1k
mit
56
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Created 2018-03-31
172 commits to main branch, last one 5 months ago
326
1.6k
apache-2.0
61
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every d...
Created 2022-09-29
236 commits to master branch, last one 5 months ago
326
870
apache-2.0
46
Hop Orchestration Platform
Created 2019-09-24
6,965 commits to main branch, last one 3 days ago
52
771
apache-2.0
13
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as ...
Created 2021-04-08
416 commits to master branch, last one about a year ago
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
Created 2022-11-06
611 commits to master branch, last one a day ago
154
525
unknown
37
汇总Apache Hudi相关资料
Created 2019-12-11
234 commits to master branch, last one 5 days ago
Fast, sensitive and accurate integration of single-cell data with Harmony
Created 2018-06-12
306 commits to master branch, last one 6 months ago
204
444
apache-2.0
20
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Created 2022-09-26
2,219 commits to main branch, last one about a month ago
113
441
unknown
15
NicheNet: predict active ligand-target links between interacting cells
Created 2018-02-05
465 commits to master branch, last one a day ago
41
353
apache-2.0
16
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Created 2022-01-11
946 commits to main branch, last one 2 days ago
49
314
bsd-3-clause
14
Reference mapping for single-cell genomics
Created 2019-08-12
1,174 commits to master branch, last one 3 months ago
Work with your web service, database, and streaming schemas in a single format.
Created 2022-12-07
339 commits to main branch, last one 3 months ago
21
292
unknown
30
Categorical Query Language IDE
Created 2019-03-13
118 commits to master branch, last one 2 months ago