35 results found Sort:
- Filter by Primary Language:
- Python (12)
- Java (7)
- Jupyter Notebook (3)
- Go (2)
- Scala (2)
- C# (2)
- Ruby (1)
- Shell (1)
- HTML (1)
- JavaScript (1)
- Makefile (1)
- PHP (1)
- PLpgSQL (1)
- +
Logstash - transport and process your logs, events, or other data
Created
2010-11-18
10,754 commits to main branch, last one 10 hours ago
The open source high performance ELT framework powered by Apache Arrow
Created
2020-11-18
17,843 commits to main branch, last one 14 hours ago
Flow-based programming for JavaScript
Created
2011-06-06
2,707 commits to master branch, last one 2 months ago
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created
2022-11-27
497 commits to main branch, last one 16 hours ago
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Created
2023-02-23
1,439 commits to main branch, last one 5 hours ago
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Created
2020-02-13
50 commits to master branch, last one 4 years ago
This repository is a getting started guide to Singer.
Created
2016-10-31
188 commits to master branch, last one 3 years ago
Making data lake work for time series
Created
2021-11-19
732 commits to master branch, last one 8 months ago
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
This repository has been archived
(exclude archived)
Created
2020-05-26
516 commits to main branch, last one 11 months ago
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Created
2016-10-19
1,017 commits to master branch, last one 6 days ago
A simplified, lightweight ETL Framework based on Apache Spark
Created
2017-10-10
658 commits to master branch, last one about a year ago
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Created
2022-09-26
2,224 commits to main branch, last one a day ago
Flow PHP - data processing framework
Created
2021-05-23
3,774 commits to 1.x branch, last one 3 days ago
Knowledge Graph Toolkit
Created
2020-01-18
4,527 commits to main branch, last one 8 months ago
A tool for building feature stores.
Created
2020-01-03
294 commits to staging branch, last one 6 days ago
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Created
2022-03-08
894 commits to main branch, last one 4 hours ago
Bender - Serverless ETL Framework
This repository has been archived
(exclude archived)
Created
2016-12-08
228 commits to master branch, last one 2 years ago
mito ETL tool
Created
2013-05-24
98 commits to master branch, last one 3 years ago
Configurable Extract, Transform, and Load
Created
2013-05-31
1,033 commits to master branch, last one 3 months ago
A visual ETL development and debugging tool for big data
Created
2017-03-09
446 commits to master branch, last one 4 years ago
(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)
Created
2013-09-02
9,508 commits to master branch, last one 14 hours ago
Global Biotic Interactions provides access to existing species interaction datasets
Created
2011-09-28
4,503 commits to main branch, last one 8 days ago
The Frank!Framework is an easy-to-use, stateless integration framework which allows (transactional) messages to be modified and exchanged between different systems.
Created
2013-03-21
9,402 commits to master branch, last one 18 hours ago
Data pipelines from re-usable components
Created
2020-05-19
253 commits to master branch, last one about a year ago
an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next ...
Created
2021-04-03
1,079 commits to master branch, last one 8 months ago
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Created
2012-08-28
501 commits to master branch, last one 23 days ago
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
This repository has been archived
(exclude archived)
Created
2019-06-25
153 commits to master branch, last one about a year ago
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Created
2016-09-14
65 commits to master branch, last one 2 years ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created
2019-11-16
15 commits to master branch, last one about a year ago
A framework for moving data into a data warehouse.
Created
2018-01-23
120 commits to master branch, last one 2 years ago