38 results found Sort:

3.5k
14.3k
other
832
Logstash - transport and process your logs, events, or other data
Created 2010-11-18
10,918 commits to main branch, last one a day ago
156
7.4k
other
29
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created 2022-11-27
947 commits to main branch, last one a day ago
512
5.9k
mpl-2.0
65
The open source high performance ELT framework powered by Apache Arrow
Created 2020-11-18
19,457 commits to main branch, last one a day ago
264
3.5k
mit
135
Flow-based programming for JavaScript
Created 2011-06-06
2,707 commits to master branch, last one 8 months ago
129
1.9k
bsd-3-clause-clear
18
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Created 2023-02-23
1,722 commits to main branch, last one 3 days ago
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Created 2020-02-13
50 commits to master branch, last one 4 years ago
147
1.3k
unknown
44
This repository is a getting started guide to Singer.
Created 2016-10-31
189 commits to master branch, last one 3 months ago
59
1.1k
apache-2.0
15
Making data lake work for time series
Created 2021-11-19
732 commits to master branch, last one about a year ago
37
862
bsd-3-clause-clear
20
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
This repository has been archived (exclude archived)
Created 2020-05-26
516 commits to main branch, last one about a year ago
136
807
mit
49
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Created 2016-10-19
1,025 commits to master branch, last one 22 days ago
272
596
apache-2.0
25
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Created 2022-09-26
2,294 commits to main branch, last one 3 days ago
155
585
mit
53
A simplified, lightweight ETL Framework based on Apache Spark
Created 2017-10-10
658 commits to master branch, last one about a year ago
28
516
mit
10
Flow PHP - data processing framework
Created 2021-05-23
4,142 commits to 1.x branch, last one 15 hours ago
58
364
mit
20
Knowledge Graph Toolkit
Created 2020-01-18
4,527 commits to main branch, last one about a year ago
36
289
apache-2.0
197
A tool for building feature stores.
Created 2020-01-03
308 commits to staging branch, last one 2 months ago
82
236
apache-2.0
13
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Created 2022-03-08
1,154 commits to main branch, last one 2 days ago
20
186
apache-2.0
26
Bender - Serverless ETL Framework
This repository has been archived (exclude archived)
Created 2016-12-08
228 commits to master branch, last one 2 years ago
41
161
unknown
19
mito ETL tool
Created 2013-05-24
98 commits to master branch, last one 3 years ago
Configurable Extract, Transform, and Load
Created 2013-05-31
1,036 commits to master branch, last one 2 months ago
107
154
apache-2.0
43
A visual ETL development and debugging tool for big data
Created 2017-03-09
446 commits to master branch, last one 5 years ago
44
147
unknown
15
(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor)
Created 2013-09-02
9,628 commits to master branch, last one 3 days ago
9
132
other
4
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
Created 2024-07-11
134 commits to master branch, last one 4 days ago
The Frank!Framework is an easy-to-use, stateless integration framework which allows (transactional) messages to be modified and exchanged between different systems.
Created 2013-03-21
10,106 commits to master branch, last one a day ago
Global Biotic Interactions provides access to existing species interaction datasets
Created 2011-09-28
4,618 commits to main branch, last one a day ago
Data pipelines from re-usable components
Created 2020-05-19
253 commits to master branch, last one about a year ago
an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next ...
Created 2021-04-03
1,082 commits to master branch, last one 20 days ago
35
86
gpl-3.0
16
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Created 2012-08-28
501 commits to master branch, last one 7 months ago
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Created 2024-07-29
64 commits to main branch, last one 2 months ago
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
This repository has been archived (exclude archived)
Created 2019-06-25
153 commits to master branch, last one 2 years ago
3
67
bsd-3-clause
5
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Created 2016-09-14
65 commits to master branch, last one 3 years ago