34 results found Sort:

65
1.9k
other
22
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
Created 2018-04-18
1,037 commits to master branch, last one about a month ago
234
1.5k
apache-2.0
37
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created 2017-07-13
6,411 commits to develop branch, last one about a year ago
156
1.1k
other
83
Logical Replication extension for PostgreSQL 17, 16, 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgra...
Created 2016-05-10
851 commits to REL2_x_STABLE branch, last one 4 months ago
124
979
agpl-3.0
16
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Created 2021-08-25
2,402 commits to main branch, last one a day ago
31
883
apache-2.0
8
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Created 2023-08-03
2,153 commits to main branch, last one 2 days ago
A block-based API for NSValueTransformer, with a growing collection of useful examples.
Created 2012-11-09
141 commits to master branch, last one 4 years ago
156
746
apache-2.0
16
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Created 2021-03-22
487 commits to main branch, last one about a year ago
35
670
other
8
Advanced and Fast Data Transformation in R
Created 2019-02-27
3,178 commits to master branch, last one 25 days ago
99
627
other
58
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Mi...
Created 2015-10-21
240 commits to main branch, last one 6 days ago
24
612
lgpl-3.0
20
:lipstick: Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.
Created 2016-01-17
260 commits to master branch, last one 2 days ago
14
311
mit
20
Like awk, but with SQL and table joins
Created 2015-01-16
241 commits to master branch, last one 2 months ago
25
285
agpl-3.0
3
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
Created 2020-09-20
1,920 commits to main branch, last one 6 months ago
12
273
mit
8
📄 Concise selector to extract JSON from HTML.
Created 2017-06-14
348 commits to master branch, last one 3 years ago
14
263
gpl-3.0
6
An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
Created 2021-03-11
317 commits to main branch, last one 14 days ago
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Created 2019-12-10
479 commits to master branch, last one about a year ago
32
179
apache-2.0
12
A simple Spark-powered ETL framework that just works 🍺
Created 2019-12-20
627 commits to master branch, last one 2 years ago
A curated list of Clojure resources for dealing with domain-specific languages.
Created 2021-01-03
53 commits to master branch, last one 6 months ago
11
171
epl-2.0
6
Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more
Created 2021-03-19
117 commits to main branch, last one 7 months ago
24
159
gpl-3.0
12
Data transformation and utility functions for R
Created 2015-03-21
952 commits to master branch, last one 9 months ago
35
141
apache-2.0
4
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Created 2019-05-26
3,418 commits to master branch, last one about a year ago
17
99
bsd-3-clause
14
A visual data pipeline builder with various backends
Created 2019-01-23
2,944 commits to master branch, last one 5 days ago
Wrangler Transform: A DMD system for transforming Big Data
Created 2016-11-27
1,539 commits to develop branch, last one 5 days ago
A schema-aware Scala library for data transformation
Created 2021-02-05
544 commits to master branch, last one 12 months ago
breadroll 🥟 is a simple lightweight library for data processing operations written in Typescript and powered by Bun.
Created 2023-06-02
195 commits to main branch, last one 6 months ago
Data transformation toolkit
Created 2019-08-18
834 commits to main branch, last one about a year ago
All.This is a modular framework for managing and standardizing data structures, enabling seamless interaction across the neurons.me ecosystem. It transforms objects like images, text, and audio into s...
Created 2024-08-02
64 commits to main branch, last one 4 days ago
Examples for working with DataWeave scripts from Apex.
This repository has been archived (exclude archived)
Created 2021-11-18
71 commits to main branch, last one about a year ago
Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.
Created 2019-09-22
395 commits to master branch, last one about a year ago
object flow treatment, data transformation
Created 2016-05-15
1,736 commits to master branch, last one 6 months ago