37 results found Sort:

349
23.6k
other
47
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created 2022-11-27
1,233 commits to main branch, last one 7 hours ago
3.5k
14.4k
other
831
Logstash - transport and process your logs, events, or other data
Created 2010-11-18
11,045 commits to main branch, last one 4 days ago
525
6.1k
mpl-2.0
65
The developer first cloud governance platform
Created 2020-11-18
19,943 commits to main branch, last one 2 days ago
265
3.5k
mit
133
Flow-based programming for JavaScript
Created 2011-06-06
2,707 commits to master branch, last one about a year ago
143
2.1k
bsd-3-clause-clear
18
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Created 2023-02-23
1,750 commits to main branch, last one 16 hours ago
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Created 2020-02-13
50 commits to master branch, last one 5 years ago
145
1.3k
unknown
41
This repository is a getting started guide to Singer.
Created 2016-10-31
189 commits to master branch, last one 7 months ago
59
1.2k
apache-2.0
14
Making data lake work for time series
Created 2021-11-19
732 commits to master branch, last one about a year ago
36
861
bsd-3-clause-clear
18
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
This repository has been archived (exclude archived)
Created 2020-05-26
516 commits to main branch, last one about a year ago
140
823
mit
50
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Created 2016-10-19
1,025 commits to master branch, last one 4 months ago
297
657
apache-2.0
24
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Created 2022-09-26
2,315 commits to main branch, last one 17 days ago
156
585
mit
50
A simplified, lightweight ETL Framework based on Apache Spark
Created 2017-10-10
658 commits to master branch, last one 2 years ago
59
378
mit
19
Knowledge Graph Toolkit
Created 2020-01-18
4,527 commits to main branch, last one about a year ago
37
299
apache-2.0
197
A tool for building feature stores.
Created 2020-01-03
336 commits to staging branch, last one 3 days ago
82
240
apache-2.0
10
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Created 2022-03-08
1,177 commits to main branch, last one 3 days ago
20
185
apache-2.0
25
Bender - Serverless ETL Framework
This repository has been archived (exclude archived)
Created 2016-12-08
228 commits to master branch, last one 3 years ago
Structured Data Extractor for AI Agents. Search your documents or the web for specific data and get it back in JSON or Markdown in a single tool call.
Created 2024-07-11
147 commits to master branch, last one 9 days ago
41
163
unknown
18
mito ETL tool
Created 2013-05-24
98 commits to master branch, last one 3 years ago
Configurable Extract, Transform, and Load
Created 2013-05-31
1,037 commits to master branch, last one a day ago
107
153
apache-2.0
42
A visual ETL development and debugging tool for big data
Created 2017-03-09
446 commits to master branch, last one 5 years ago
44
146
unknown
14
(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor)
Created 2013-09-02
9,686 commits to master branch, last one 7 days ago
The Frank!Framework is an easy-to-use, stateless integration framework which allows (transactional) messages to be modified and exchanged between different systems.
Created 2013-03-21
10,470 commits to master branch, last one 11 hours ago
Global Biotic Interactions provides access to existing species interaction datasets
Created 2011-09-28
4,706 commits to main branch, last one 10 days ago
Data pipelines from re-usable components
Created 2020-05-19
253 commits to master branch, last one 2 years ago
an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next ...
Created 2021-04-03
1,082 commits to master branch, last one 4 months ago
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Created 2024-07-29
64 commits to main branch, last one 6 months ago
35
86
gpl-3.0
15
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Created 2012-08-28
501 commits to master branch, last one 10 months ago
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
This repository has been archived (exclude archived)
Created 2019-06-25
153 commits to master branch, last one 2 years ago
0
69
apache-2.0
1
Ylem is an open-source platform for real-time data streaming orchestration
Created 2024-09-02
23 commits to main branch, last one about a month ago
3
67
bsd-3-clause
4
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Created 2016-09-14
65 commits to master branch, last one 3 years ago