54 results found Sort:

14.3k
37.0k
apache-2.0
758
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Created 2015-04-13
26,564 commits to main branch, last one 9 hours ago
2.2k
22.0k
apache-2.0
128
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Created 2023-12-12
1,535 commits to main branch, last one 18 hours ago
4.6k
12.8k
apache-2.0
327
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Created 2019-03-01
8,537 commits to dev branch, last one 21 hours ago
1.5k
11.6k
apache-2.0
120
An orchestration platform for the development, production, and observation of data assets.
Created 2018-04-30
21,005 commits to master branch, last one 11 hours ago
743
9.0k
apache-2.0
59
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Created 2022-09-26
1,613 commits to main branch, last one 6 days ago
763
7.9k
apache-2.0
62
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Created 2022-05-16
5,465 commits to master branch, last one 5 days ago
136
4.1k
other
28
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created 2022-11-27
823 commits to main branch, last one 10 hours ago
257
4.1k
apache-2.0
43
Build data pipelines, the easy way 🛠️
Created 2020-05-21
9,688 commits to master branch, last one about a year ago
493
3.9k
apache-2.0
41
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Created 2019-08-31
2,354 commits to master branch, last one 8 hours ago
164
1.9k
apache-2.0
10
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Created 2021-08-30
5,051 commits to master branch, last one 12 hours ago
165
1.8k
mit
14
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Created 2021-06-21
11,613 commits to main branch, last one a day ago
312
1.5k
apache-2.0
66
MLeap: Deploy ML Pipelines to Production
Created 2016-08-23
1,034 commits to master branch, last one 4 months ago
The best place to learn data engineering. Built and maintained by the data engineering community.
Created 2021-05-04
267 commits to main branch, last one 10 days ago
111
1.2k
mit
13
A system for agentic LLM-powered data processing and ETL
Created 2024-07-09
622 commits to main branch, last one 22 hours ago
100
1.2k
apache-2.0
19
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Created 2021-07-07
825 commits to main branch, last one about a month ago
43
889
other
9
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Created 2024-03-20
221 commits to main branch, last one 21 hours ago
160
845
apache-2.0
27
Dataform is a framework for managing SQL based data operations in BigQuery
Created 2018-09-03
1,739 commits to main branch, last one a day ago
153
745
apache-2.0
17
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Created 2021-03-22
487 commits to main branch, last one about a year ago
46
719
other
12
The Feldera Incremental Computation Engine
Created 2023-05-11
3,750 commits to main branch, last one a day ago
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Created 2023-06-23
43 commits to main branch, last one 3 months ago
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Created 2022-11-06
887 commits to master branch, last one 5 hours ago
57
428
apache-2.0
17
One framework to develop, deploy and operate data workflows with Python and SQL.
Created 2021-07-20
2,181 commits to main branch, last one 27 days ago
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service wit...
Created 2022-01-09
3,208 commits to master branch, last one 16 hours ago
24
330
mit
10
Work with your web service, database, and streaming schemas in a single format.
Created 2022-12-07
339 commits to main branch, last one 8 months ago
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a...
Created 2021-11-23
2,726 commits to main branch, last one 2 months ago
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
Created 2023-01-09
51 commits to main branch, last one 3 months ago
18
191
apache-2.0
2
Performance Observability for Apache Spark
Created 2023-09-28
384 commits to main branch, last one 15 hours ago
49
189
unknown
5
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Created 2021-11-12
741 commits to main branch, last one 14 hours ago
8
183
bsd-3-clause
8
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
Created 2020-02-26
331 commits to main branch, last one 2 months ago
Relational data pipelines for the science lab
Created 2012-09-19
4,539 commits to master branch, last one 26 days ago