38 results found Sort:

513
5.9k
mpl-2.0
63
The open source high performance ELT framework powered by Apache Arrow
Created 2020-11-18
19,160 commits to main branch, last one 22 hours ago
139
4.3k
other
29
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created 2022-11-27
867 commits to main branch, last one 17 hours ago
263
3.5k
mit
135
Flow-based programming for JavaScript
Created 2011-06-06
2,707 commits to master branch, last one 7 months ago
125
1.9k
bsd-3-clause-clear
17
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Created 2023-02-23
1,692 commits to main branch, last one a day ago
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Created 2020-02-13
50 commits to master branch, last one 4 years ago
147
1.3k
unknown
44
This repository is a getting started guide to Singer.
Created 2016-10-31
189 commits to master branch, last one 2 months ago
60
1.1k
apache-2.0
15
Making data lake work for time series
Created 2021-11-19
732 commits to master branch, last one about a year ago
37
863
bsd-3-clause-clear
20
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
This repository has been archived (exclude archived)
Created 2020-05-26
516 commits to main branch, last one about a year ago
135
802
mit
49
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Created 2016-10-19
1,019 commits to master branch, last one 7 days ago
155
584
mit
53
A simplified, lightweight ETL Framework based on Apache Spark
Created 2017-10-10
658 commits to master branch, last one about a year ago
265
568
apache-2.0
25
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Created 2022-09-26
2,291 commits to main branch, last one 10 days ago
28
485
mit
9
Flow PHP - data processing framework
Created 2021-05-23
4,083 commits to 1.x branch, last one 3 days ago
57
357
mit
20
Knowledge Graph Toolkit
Created 2020-01-18
4,527 commits to main branch, last one about a year ago
36
283
apache-2.0
186
A tool for building feature stores.
Created 2020-01-03
308 commits to staging branch, last one about a month ago
82
235
apache-2.0
13
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Created 2022-03-08
1,130 commits to main branch, last one 18 hours ago
21
186
apache-2.0
26
Bender - Serverless ETL Framework
This repository has been archived (exclude archived)
Created 2016-12-08
228 commits to master branch, last one 2 years ago
41
161
unknown
19
mito ETL tool
Created 2013-05-24
98 commits to master branch, last one 3 years ago
Configurable Extract, Transform, and Load
Created 2013-05-31
1,036 commits to master branch, last one about a month ago
107
154
apache-2.0
43
A visual ETL development and debugging tool for big data
Created 2017-03-09
446 commits to master branch, last one 5 years ago
44
147
unknown
15
(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor)
Created 2013-09-02
9,604 commits to master branch, last one 2 days ago
The Frank!Framework is an easy-to-use, stateless integration framework which allows (transactional) messages to be modified and exchanged between different systems.
Created 2013-03-21
9,988 commits to master branch, last one 22 hours ago
Global Biotic Interactions provides access to existing species interaction datasets
Created 2011-09-28
4,597 commits to main branch, last one 12 days ago
8
119
other
4
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
Created 2024-07-11
132 commits to master branch, last one a day ago
Data pipelines from re-usable components
Created 2020-05-19
253 commits to master branch, last one about a year ago
an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next ...
Created 2021-04-03
1,081 commits to master branch, last one about a month ago
35
85
gpl-3.0
16
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Created 2012-08-28
501 commits to master branch, last one 6 months ago
3.5k
82
other
181
Logstash - transport and process your logs, events, or other data
Created 2010-11-18
10,888 commits to main branch, last one 18 hours ago
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Created 2024-07-29
64 commits to main branch, last one about a month ago
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
This repository has been archived (exclude archived)
Created 2019-06-25
153 commits to master branch, last one about a year ago
3
67
bsd-3-clause
5
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Created 2016-09-14
65 commits to master branch, last one 3 years ago