54 results found Sort:
- Filter by Primary Language:
- Python (22)
- Go (5)
- Rust (5)
- TypeScript (4)
- Jupyter Notebook (3)
- Java (3)
- HTML (2)
- JavaScript (2)
- Scala (2)
- C# (1)
- CSS (1)
- PHP (1)
- Shell (1)
- +
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Created
2015-04-13
26,564 commits to main branch, last one 9 hours ago
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Created
2023-12-12
1,535 commits to main branch, last one 18 hours ago
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Created
2019-03-01
8,537 commits to dev branch, last one 21 hours ago
An orchestration platform for the development, production, and observation of data assets.
Created
2018-04-30
21,005 commits to master branch, last one 11 hours ago
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Created
2022-09-26
1,613 commits to main branch, last one 6 days ago
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Created
2022-05-16
5,465 commits to master branch, last one 5 days ago
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created
2022-11-27
823 commits to main branch, last one 10 hours ago
Build data pipelines, the easy way 🛠️
Created
2020-05-21
9,688 commits to master branch, last one about a year ago
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
Created
2019-08-31
2,354 commits to master branch, last one 8 hours ago
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Created
2021-08-30
5,051 commits to master branch, last one 12 hours ago
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Created
2021-06-21
11,613 commits to main branch, last one a day ago
MLeap: Deploy ML Pipelines to Production
Created
2016-08-23
1,034 commits to master branch, last one 4 months ago
The best place to learn data engineering. Built and maintained by the data engineering community.
Created
2021-05-04
267 commits to main branch, last one 10 days ago
A system for agentic LLM-powered data processing and ETL
Created
2024-07-09
622 commits to main branch, last one 22 hours ago
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Created
2021-07-07
825 commits to main branch, last one about a month ago
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Created
2024-03-20
221 commits to main branch, last one 21 hours ago
Dataform is a framework for managing SQL based data operations in BigQuery
Created
2018-09-03
1,739 commits to main branch, last one a day ago
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Created
2021-03-22
487 commits to main branch, last one about a year ago
The Feldera Incremental Computation Engine
Created
2023-05-11
3,750 commits to main branch, last one a day ago
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Created
2023-06-23
43 commits to main branch, last one 3 months ago
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Created
2022-11-06
887 commits to master branch, last one 5 hours ago
One framework to develop, deploy and operate data workflows with Python and SQL.
Created
2021-07-20
2,181 commits to main branch, last one 27 days ago
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service wit...
Created
2022-01-09
3,208 commits to master branch, last one 16 hours ago
Work with your web service, database, and streaming schemas in a single format.
Created
2022-12-07
339 commits to main branch, last one 8 months ago
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a...
Created
2021-11-23
2,726 commits to main branch, last one 2 months ago
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
Created
2023-01-09
51 commits to main branch, last one 3 months ago
Performance Observability for Apache Spark
Created
2023-09-28
384 commits to main branch, last one 15 hours ago
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Created
2021-11-12
741 commits to main branch, last one 14 hours ago
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
Created
2020-02-26
331 commits to main branch, last one 2 months ago
Relational data pipelines for the science lab
Created
2012-09-19
4,539 commits to master branch, last one 26 days ago