38 results found Sort:
- Filter by Primary Language:
- Python (14)
- Java (7)
- Jupyter Notebook (3)
- JavaScript (2)
- Scala (2)
- C# (2)
- Go (2)
- Shell (1)
- HTML (1)
- Makefile (1)
- PHP (1)
- PLpgSQL (1)
- Ruby (1)
- +
The open source high performance ELT framework powered by Apache Arrow
Created
2020-11-18
19,160 commits to main branch, last one 22 hours ago
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created
2022-11-27
867 commits to main branch, last one 17 hours ago
Flow-based programming for JavaScript
Created
2011-06-06
2,707 commits to master branch, last one 7 months ago
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Created
2023-02-23
1,692 commits to main branch, last one a day ago
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Created
2020-02-13
50 commits to master branch, last one 4 years ago
This repository is a getting started guide to Singer.
Created
2016-10-31
189 commits to master branch, last one 2 months ago
Making data lake work for time series
Created
2021-11-19
732 commits to master branch, last one about a year ago
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
This repository has been archived
(exclude archived)
Created
2020-05-26
516 commits to main branch, last one about a year ago
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Created
2016-10-19
1,019 commits to master branch, last one 7 days ago
A simplified, lightweight ETL Framework based on Apache Spark
Created
2017-10-10
658 commits to master branch, last one about a year ago
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Created
2022-09-26
2,291 commits to main branch, last one 10 days ago
Flow PHP - data processing framework
Created
2021-05-23
4,083 commits to 1.x branch, last one 3 days ago
Knowledge Graph Toolkit
Created
2020-01-18
4,527 commits to main branch, last one about a year ago
A tool for building feature stores.
Created
2020-01-03
308 commits to staging branch, last one about a month ago
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Created
2022-03-08
1,130 commits to main branch, last one 18 hours ago
Bender - Serverless ETL Framework
This repository has been archived
(exclude archived)
Created
2016-12-08
228 commits to master branch, last one 2 years ago
mito ETL tool
Created
2013-05-24
98 commits to master branch, last one 3 years ago
Configurable Extract, Transform, and Load
Created
2013-05-31
1,036 commits to master branch, last one about a month ago
A visual ETL development and debugging tool for big data
Created
2017-03-09
446 commits to master branch, last one 5 years ago
(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor)
Created
2013-09-02
9,604 commits to master branch, last one 2 days ago
The Frank!Framework is an easy-to-use, stateless integration framework which allows (transactional) messages to be modified and exchanged between different systems.
Created
2013-03-21
9,988 commits to master branch, last one 22 hours ago
Global Biotic Interactions provides access to existing species interaction datasets
Created
2011-09-28
4,597 commits to main branch, last one 12 days ago
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
Created
2024-07-11
132 commits to master branch, last one a day ago
Data pipelines from re-usable components
Created
2020-05-19
253 commits to master branch, last one about a year ago
an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next ...
Created
2021-04-03
1,081 commits to master branch, last one about a month ago
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Created
2012-08-28
501 commits to master branch, last one 6 months ago
Logstash - transport and process your logs, events, or other data
Created
2010-11-18
10,888 commits to main branch, last one 18 hours ago
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Created
2024-07-29
64 commits to main branch, last one about a month ago
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
This repository has been archived
(exclude archived)
Created
2019-06-25
153 commits to master branch, last one about a year ago
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Created
2016-09-14
65 commits to master branch, last one 3 years ago