47 results found Sort:
- Filter by Primary Language:
- Python (23)
- Go (7)
- Java (5)
- TypeScript (3)
- JavaScript (2)
- Shell (1)
- Jupyter Notebook (1)
- Groovy (1)
- C++ (1)
- PLpgSQL (1)
- Rust (1)
- +
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
Created
2015-04-13
26,564 commits to main branch, last one 9 hours ago
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Created
2020-07-27
22,880 commits to master branch, last one 6 hours ago
Apache Doris is an easy-to-use, high performance and unified analytics database.
Created
2017-08-10
22,797 commits to master branch, last one 14 hours ago
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
Created
2016-03-10
6,917 commits to main branch, last one 9 hours ago
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Created
2017-08-05
4,409 commits to dev branch, last one a day ago
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Created
2022-05-16
5,465 commits to master branch, last one 5 days ago
The open source high performance ELT framework powered by Apache Arrow
Created
2020-11-18
19,093 commits to main branch, last one 9 hours ago
Flink CDC is a streaming data integration tool
Created
2020-07-27
1,118 commits to master branch, last one 18 hours ago
Privacy and Security focused Segment-alternative, in Golang and React
Created
2019-07-19
5,687 commits to master branch, last one 22 hours ago
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Created
2022-01-26
3,374 commits to devel branch, last one 12 hours ago
Open-source BI for engineers
Created
2024-02-20
396 commits to main branch, last one 8 days ago
Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.
Created
2021-06-21
11,613 commits to main branch, last one a day ago
Efficient data transformation and modeling framework that is backwards compatible with dbt.
Created
2022-09-23
2,933 commits to main branch, last one 20 hours ago
A system for agentic LLM-powered data processing and ETL
Created
2024-07-09
622 commits to main branch, last one 22 hours ago
Dataform is a framework for managing SQL based data operations in BigQuery
Created
2018-09-03
1,739 commits to main branch, last one a day ago
Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as ...
Created
2021-04-08
416 commits to master branch, last one 2 years ago
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Created
2021-03-22
487 commits to main branch, last one about a year ago
Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift, Databricks) in real-time.
Created
2022-11-06
887 commits to master branch, last one 5 hours ago
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Created
2019-09-27
4,037 commits to master branch, last one 5 months ago
dbt + Metabase integration
Created
2019-12-12
178 commits to master branch, last one 5 days ago
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
Created
2020-10-15
1,853 commits to main branch, last one 12 days ago
One framework to develop, deploy and operate data workflows with Python and SQL.
Created
2021-07-20
2,181 commits to main branch, last one 27 days ago
ReplicaDB is open source tool for database replication, designed for efficiently transferring bulk data between relational and non-relational databases
Created
2018-12-05
442 commits to master branch, last one 6 months ago
Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Created
2021-12-06
1,162 commits to main branch, last one 4 months ago
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Created
2018-05-08
14 commits to master branch, last one 4 years ago
Use SQL to build ELT pipelines on a data lakehouse.
Created
2021-03-11
481 commits to main branch, last one 2 years ago
The dbt data-validation toolkit for teams that care about building better data
Created
2023-10-06
1,558 commits to main branch, last one 20 hours ago
CLI tool for dbt users to simplify creation of staging models (yml and sql) files
Created
2021-06-28
891 commits to main branch, last one 13 days ago
PyAirbyte brings the power of Airbyte to every Python developer.
Created
2024-02-04
238 commits to main branch, last one 8 days ago
Play detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Created
2020-10-20
102 commits to main branch, last one 2 years ago