35 results found Sort:

3.5k
14.1k
other
829
Logstash - transport and process your logs, events, or other data
Created 2010-11-18
10,754 commits to main branch, last one 10 hours ago
500
5.7k
mpl-2.0
60
The open source high performance ELT framework powered by Apache Arrow
Created 2020-11-18
17,843 commits to main branch, last one 14 hours ago
258
3.5k
mit
136
Flow-based programming for JavaScript
Created 2011-06-06
2,707 commits to master branch, last one 2 months ago
91
2.7k
other
23
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created 2022-11-27
497 commits to main branch, last one 16 hours ago
87
1.5k
bsd-3-clause-clear
13
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Created 2023-02-23
1,439 commits to main branch, last one 5 hours ago
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Created 2020-02-13
50 commits to master branch, last one 4 years ago
144
1.2k
unknown
44
This repository is a getting started guide to Singer.
Created 2016-10-31
188 commits to master branch, last one 3 years ago
58
1.1k
apache-2.0
15
Making data lake work for time series
Created 2021-11-19
732 commits to master branch, last one 8 months ago
38
865
bsd-3-clause-clear
20
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
This repository has been archived (exclude archived)
Created 2020-05-26
516 commits to main branch, last one 11 months ago
134
747
mit
49
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Created 2016-10-19
1,017 commits to master branch, last one 6 days ago
152
576
mit
52
A simplified, lightweight ETL Framework based on Apache Spark
Created 2017-10-10
658 commits to master branch, last one about a year ago
206
449
apache-2.0
21
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Created 2022-09-26
2,224 commits to main branch, last one a day ago
22
370
mit
6
Flow PHP - data processing framework
Created 2021-05-23
3,774 commits to 1.x branch, last one 3 days ago
57
345
mit
20
Knowledge Graph Toolkit
Created 2020-01-18
4,527 commits to main branch, last one 8 months ago
35
271
apache-2.0
183
A tool for building feature stores.
Created 2020-01-03
294 commits to staging branch, last one 6 days ago
76
217
apache-2.0
11
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Created 2022-03-08
894 commits to main branch, last one 4 hours ago
21
184
apache-2.0
27
Bender - Serverless ETL Framework
This repository has been archived (exclude archived)
Created 2016-12-08
228 commits to master branch, last one 2 years ago
40
163
unknown
19
mito ETL tool
Created 2013-05-24
98 commits to master branch, last one 3 years ago
Configurable Extract, Transform, and Load
Created 2013-05-31
1,033 commits to master branch, last one 3 months ago
109
153
apache-2.0
43
A visual ETL development and debugging tool for big data
Created 2017-03-09
446 commits to master branch, last one 4 years ago
44
140
unknown
15
(Spatial) data harmonisation with hale studio (formerly HUMBOLDT Alignment Editor)
Created 2013-09-02
9,508 commits to master branch, last one 14 hours ago
Global Biotic Interactions provides access to existing species interaction datasets
Created 2011-09-28
4,503 commits to main branch, last one 8 days ago
The Frank!Framework is an easy-to-use, stateless integration framework which allows (transactional) messages to be modified and exchanged between different systems.
Created 2013-03-21
9,402 commits to master branch, last one 18 hours ago
Data pipelines from re-usable components
Created 2020-05-19
253 commits to master branch, last one about a year ago
an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next ...
Created 2021-04-03
1,079 commits to master branch, last one 8 months ago
34
82
gpl-3.0
16
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Created 2012-08-28
501 commits to master branch, last one 23 days ago
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
This repository has been archived (exclude archived)
Created 2019-06-25
153 commits to master branch, last one about a year ago
3
66
bsd-3-clause
5
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Created 2016-09-14
65 commits to master branch, last one 2 years ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created 2019-11-16
15 commits to master branch, last one about a year ago
A framework for moving data into a data warehouse.
Created 2018-01-23
120 commits to master branch, last one 2 years ago