37 results found Sort:
- Filter by Primary Language:
- Python (14)
- Java (7)
- Jupyter Notebook (3)
- Go (2)
- Scala (2)
- JavaScript (2)
- C# (2)
- Shell (1)
- HTML (1)
- Makefile (1)
- PLpgSQL (1)
- Ruby (1)
- +
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created
2022-11-27
1,233 commits to main branch, last one 7 hours ago
Logstash - transport and process your logs, events, or other data
Created
2010-11-18
11,045 commits to main branch, last one 4 days ago
The developer first cloud governance platform
Created
2020-11-18
19,943 commits to main branch, last one 2 days ago
Flow-based programming for JavaScript
Created
2011-06-06
2,707 commits to master branch, last one about a year ago
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Created
2023-02-23
1,750 commits to main branch, last one 16 hours ago
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Created
2020-02-13
50 commits to master branch, last one 5 years ago
This repository is a getting started guide to Singer.
Created
2016-10-31
189 commits to master branch, last one 7 months ago
Making data lake work for time series
Created
2021-11-19
732 commits to master branch, last one about a year ago
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
This repository has been archived
(exclude archived)
Created
2020-05-26
516 commits to main branch, last one about a year ago
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Created
2016-10-19
1,025 commits to master branch, last one 4 months ago
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Created
2022-09-26
2,315 commits to main branch, last one 17 days ago
A simplified, lightweight ETL Framework based on Apache Spark
Created
2017-10-10
658 commits to master branch, last one 2 years ago
Knowledge Graph Toolkit
Created
2020-01-18
4,527 commits to main branch, last one about a year ago
A tool for building feature stores.
Created
2020-01-03
336 commits to staging branch, last one 3 days ago
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Created
2022-03-08
1,177 commits to main branch, last one 3 days ago
Bender - Serverless ETL Framework
This repository has been archived
(exclude archived)
Created
2016-12-08
228 commits to master branch, last one 3 years ago
Structured Data Extractor for AI Agents. Search your documents or the web for specific data and get it back in JSON or Markdown in a single tool call.
Created
2024-07-11
147 commits to master branch, last one 9 days ago
mito ETL tool
Created
2013-05-24
98 commits to master branch, last one 3 years ago
Configurable Extract, Transform, and Load
Created
2013-05-31
1,037 commits to master branch, last one a day ago
A visual ETL development and debugging tool for big data
Created
2017-03-09
446 commits to master branch, last one 5 years ago
(Spatial) data harmonisation with hale»studio (formerly HUMBOLDT Alignment Editor)
Created
2013-09-02
9,686 commits to master branch, last one 7 days ago
The Frank!Framework is an easy-to-use, stateless integration framework which allows (transactional) messages to be modified and exchanged between different systems.
Created
2013-03-21
10,470 commits to master branch, last one 11 hours ago
Global Biotic Interactions provides access to existing species interaction datasets
Created
2011-09-28
4,706 commits to main branch, last one 10 days ago
Data pipelines from re-usable components
Created
2020-05-19
253 commits to master branch, last one 2 years ago
an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next ...
Created
2021-04-03
1,082 commits to master branch, last one 4 months ago
Build super simple end-to-end data & ETL pipelines for your vector databases and Generative AI applications
Created
2024-07-29
64 commits to main branch, last one 6 months ago
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Created
2012-08-28
501 commits to master branch, last one 10 months ago
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
This repository has been archived
(exclude archived)
Created
2019-06-25
153 commits to master branch, last one 2 years ago
Ylem is an open-source platform for real-time data streaming orchestration
Created
2024-09-02
23 commits to main branch, last one about a month ago
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Created
2016-09-14
65 commits to master branch, last one 3 years ago