53 results found Sort:
- Filter by Primary Language:
- Python (19)
- Java (11)
- Go (5)
- JavaScript (4)
- R (3)
- TypeScript (1)
- C# (1)
- Vue (1)
- HTML (1)
- Jupyter Notebook (1)
- MATLAB (1)
- Rust (1)
- +
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Created
2015-04-13
24,622 commits to main branch, last one 13 hours ago
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Created
2020-07-27
16,095 commits to master branch, last one 11 hours ago
An orchestration platform for the development, production, and observation of data assets.
Created
2018-04-30
18,602 commits to master branch, last one 12 hours ago
Turns Data and AI algorithms into production-ready web applications in no time.
Created
2022-02-18
6,277 commits to develop branch, last one 22 hours ago
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Created
2017-08-05
3,989 commits to dev branch, last one 23 hours ago
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Created
2022-05-16
5,293 commits to master branch, last one 24 hours ago
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Created
2019-08-24
3,188 commits to develop branch, last one 12 hours ago
The open source high performance ELT framework powered by Apache Arrow
Created
2020-11-18
17,712 commits to main branch, last one 18 hours ago
Flink CDC is a streaming data integration tool
Created
2020-07-27
974 commits to master branch, last one 2 days ago
Upserts, Deletes And Incremental Processing on Big Data.
Created
2016-12-14
5,446 commits to master branch, last one 12 hours ago
Privacy and Security focused Segment-alternative, in Golang and React
Created
2019-07-19
5,435 commits to master branch, last one a day ago
A data integration framework
Created
2018-04-03
5,263 commits to master branch, last one 2 days ago
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
Created
2020-08-04
820 commits to newjitsu branch, last one 19 hours ago
Community-curated list of software packages and data resources for single-cell, including RNA-seq, ATAC-seq, etc.
python
analysis
atac-seq
cell-cycle
clustering
single-cell
awesome-list
rna-seq-data
cell-clusters
bioinformatics
scrna-seq-data
gene-expression
cell-populations
data-integration
analysis-pipeline
data-visualization
rna-seq-experiments
cell-differentiation
dimensionality-reduction
gene-expression-profiles
Created
2016-06-29
743 commits to master branch, last one 2 months ago
Lean and mean distributed stream processing system written in rust and web assembly.
Created
2019-08-31
2,254 commits to master branch, last one 13 hours ago
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and co...
Created
2021-07-08
5,308 commits to main branch, last one a day ago
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
Created
2024-02-12
109 commits to main branch, last one 6 days ago
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Created
2018-03-31
172 commits to main branch, last one 5 months ago
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every d...
Created
2022-09-29
236 commits to master branch, last one 5 months ago
Hop Orchestration Platform
Created
2019-09-24
6,965 commits to main branch, last one 3 days ago
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as ...
Created
2021-04-08
416 commits to master branch, last one about a year ago
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.
Created
2022-11-06
611 commits to master branch, last one a day ago
汇总Apache Hudi相关资料
Created
2019-12-11
234 commits to master branch, last one 5 days ago
Fast, sensitive and accurate integration of single-cell data with Harmony
Created
2018-06-12
306 commits to master branch, last one 6 months ago
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Created
2022-09-26
2,219 commits to main branch, last one about a month ago
NicheNet: predict active ligand-target links between interacting cells
Created
2018-02-05
465 commits to master branch, last one a day ago
Conduit streams data between data stores. Kafka Connect replacement. No JVM required.
Created
2022-01-11
946 commits to main branch, last one 2 days ago
Reference mapping for single-cell genomics
Created
2019-08-12
1,174 commits to master branch, last one 3 months ago
Work with your web service, database, and streaming schemas in a single format.
Created
2022-12-07
339 commits to main branch, last one 3 months ago
Categorical Query Language IDE
Created
2019-03-13
118 commits to master branch, last one 2 months ago