54 results found Sort:
- Filter by Primary Language:
- Python (22)
- Go (5)
- Rust (5)
- TypeScript (4)
- Jupyter Notebook (3)
- Java (3)
- HTML (2)
- JavaScript (2)
- Scala (2)
- C# (1)
- CSS (1)
- PHP (1)
- Shell (1)
- +
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Created
2015-04-13
26,842 commits to main branch, last one 15 hours ago
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Created
2023-12-12
1,659 commits to main branch, last one 21 hours ago
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
Created
2019-03-01
8,552 commits to dev branch, last one a day ago
An orchestration platform for the development, production, and observation of data assets.
Created
2018-04-30
21,177 commits to master branch, last one 10 hours ago
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
Created
2022-09-26
1,620 commits to main branch, last one 17 hours ago
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Created
2022-05-16
5,495 commits to master branch, last one 2 days ago
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created
2022-11-27
867 commits to main branch, last one 17 hours ago
Build data pipelines, the easy way 🛠️
Created
2020-05-21
9,688 commits to master branch, last one about a year ago
Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.
rust
stateful
data-flow
real-time
streaming
serverless
webassembly
cloud-native
data-analytics
data-pipelines
streaming-data
data-integration
stream-processing
distributed-systems
streaming-analytics
stream-processing-engine
streaming-data-pipelines
event-driven-architecture
streaming-data-processing
Created
2019-08-31
2,371 commits to master branch, last one a day ago
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
Created
2021-08-30
5,068 commits to master branch, last one a day ago
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Created
2021-06-21
11,643 commits to main branch, last one 9 hours ago
MLeap: Deploy ML Pipelines to Production
Created
2016-08-23
1,050 commits to master branch, last one 8 days ago
The best place to learn data engineering. Built and maintained by the data engineering community.
Created
2021-05-04
272 commits to main branch, last one 2 days ago
A system for agentic LLM-powered data processing and ETL
Created
2024-07-09
679 commits to main branch, last one a day ago
First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.
Created
2021-07-07
825 commits to main branch, last one 2 months ago
Visual Data Transformation with Python Code Generation. Low-Code Python-based ETL.
Created
2024-03-20
239 commits to main branch, last one 3 days ago
Dataform is a framework for managing SQL based data operations in BigQuery
Created
2018-09-03
1,740 commits to main branch, last one 9 days ago
The Feldera Incremental Computation Engine
Created
2023-05-11
3,829 commits to main branch, last one a day ago
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Created
2021-03-22
487 commits to main branch, last one about a year ago
Kickstart your MLOps initiative with a flexible, robust, and productive Python package.
Created
2023-06-23
43 commits to main branch, last one 3 months ago
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Created
2022-11-06
917 commits to master branch, last one a day ago
One framework to develop, deploy and operate data workflows with Python and SQL.
Created
2021-07-20
2,182 commits to main branch, last one 13 days ago
dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service wit...
Created
2022-01-09
3,211 commits to master branch, last one 19 hours ago
Work with your web service, database, and streaming schemas in a single format.
Created
2022-12-07
339 commits to main branch, last one 8 months ago
Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a...
Created
2021-11-23
2,726 commits to main branch, last one 2 months ago
Performance Observability for Apache Spark
Created
2023-09-28
385 commits to main branch, last one 4 days ago
A curated list of awesome projects and resources related to Kubeflow (a CNCF incubating project)
Created
2023-01-09
54 commits to main branch, last one 2 days ago
Main repo including core data model, data marts, reference data, terminology, and the clinical concept library
Created
2021-11-12
746 commits to main branch, last one 5 days ago
A lightweight CLI tool for versioning data alongside source code and building data pipelines.
Created
2020-02-26
331 commits to main branch, last one 2 months ago
Relational data pipelines for the science lab
Created
2012-09-19
4,539 commits to master branch, last one about a month ago