54 results found Sort:

14.3k
37.2k
apache-2.0
760
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Created 2015-04-13
26,842 commits to main branch, last one 15 hours ago
2.3k
23.3k
apache-2.0
133
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Created 2023-12-12
1,659 commits to main branch, last one 21 hours ago
4.6k
12.9k
apache-2.0
329
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Created 2019-03-01
8,552 commits to dev branch, last one a day ago
1.5k
11.7k
apache-2.0
122
An orchestration platform for the development, production, and observation of data assets.
Created 2018-04-30
21,177 commits to master branch, last one 10 hours ago
762
9.2k
apache-2.0
60
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Created 2022-09-26
1,620 commits to main branch, last one 17 hours ago
774
8.0k
apache-2.0
63
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Created 2022-05-16
5,495 commits to master branch, last one 2 days ago
139
4.3k
other
29
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created 2022-11-27
867 commits to main branch, last one 17 hours ago
259
4.1k
apache-2.0
44
Build data pipelines, the easy way 🛠️
Created 2020-05-21
9,688 commits to master branch, last one about a year ago
489
3.9k
apache-2.0
42
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Created 2019-08-31
2,371 commits to master branch, last one a day ago
165
1.9k
apache-2.0
10
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Created 2021-08-30
5,068 commits to master branch, last one a day ago
166
1.8k
mit
14
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Created 2021-06-21
11,643 commits to main branch, last one 9 hours ago
313
1.5k
apache-2.0
66
MLeap: Deploy ML Pipelines to Production
Created 2016-08-23
1,050 commits to master branch, last one 8 days ago
The best place to learn data engineering. Built and maintained by the data engineering community.
Created 2021-05-04
272 commits to main branch, last one 2 days ago
117
1.3k
mit
13
A system for agentic LLM-powered data processing and ETL
Created 2024-07-09
679 commits to main branch, last one a day ago
101
1.2k
apache-2.0
19
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Created 2021-07-07
825 commits to main branch, last one 2 months ago
44
907
other
9
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Created 2024-03-20
239 commits to main branch, last one 3 days ago
163
851
apache-2.0
27
Dataform is a framework for managing SQL based data operations in BigQuery
Created 2018-09-03
1,740 commits to main branch, last one 9 days ago
47
769
other
12
The Feldera Incremental Computation Engine
Created 2023-05-11
3,829 commits to main branch, last one a day ago
153
746
apache-2.0
17
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Created 2021-03-22
487 commits to main branch, last one about a year ago
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Created 2023-06-23
43 commits to main branch, last one 3 months ago
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Created 2022-11-06
917 commits to master branch, last one a day ago
57
430
apache-2.0
17
One framework to develop, deploy and operate data workflows with Python and SQL.
Created 2021-07-20
2,182 commits to main branch, last one 13 days ago
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service wit...
Created 2022-01-09
3,211 commits to master branch, last one 19 hours ago
24
332
mit
10
Work with your web service, database, and streaming schemas in a single format.
Created 2022-12-07
339 commits to main branch, last one 8 months ago
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a...
Created 2021-11-23
2,726 commits to main branch, last one 2 months ago
19
198
apache-2.0
3
Performance Observability for Apache Spark
Created 2023-09-28
385 commits to main branch, last one 4 days ago
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
Created 2023-01-09
54 commits to main branch, last one 2 days ago
50
191
unknown
5
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Created 2021-11-12
746 commits to main branch, last one 5 days ago
8
183
bsd-3-clause
8
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
Created 2020-02-26
331 commits to main branch, last one 2 months ago
Relational data pipelines for the science lab
Created 2012-09-19
4,539 commits to master branch, last one about a month ago