32 results found Sort:
- Filter by Primary Language:
- Python (6)
- TypeScript (4)
- R (2)
- C# (2)
- Scala (2)
- C (2)
- Java (2)
- PHP (2)
- Tcl (1)
- JavaScript (1)
- Clojure (1)
- Elixir (1)
- Go (1)
- HTML (1)
- Apex (1)
- Objective-C (1)
- Ruby (1)
- +
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
Created
2018-04-18
1,034 commits to master branch, last one 20 days ago
:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Created
2017-07-13
6,411 commits to develop branch, last one about a year ago
Logical Replication extension for PostgreSQL 17, 16, 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgra...
Created
2016-05-10
851 commits to REL2_x_STABLE branch, last one 2 months ago
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Created
2021-08-25
2,262 commits to main branch, last one 14 hours ago
A block-based API for NSValueTransformer, with a growing collection of useful examples.
Created
2012-11-09
141 commits to master branch, last one 4 years ago
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Created
2021-03-22
487 commits to main branch, last one about a year ago
Advanced and Fast Data Transformation in R
Created
2019-02-27
3,075 commits to master branch, last one 4 days ago
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Mi...
Created
2015-10-21
236 commits to main branch, last one 25 days ago
:lipstick: Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.
Created
2016-01-17
257 commits to master branch, last one about a year ago
Like awk but with SQL and table joins
Created
2015-01-16
239 commits to master branch, last one 6 months ago
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
Created
2020-09-20
1,920 commits to main branch, last one 4 months ago
📄 Concise selector to extract JSON from HTML.
Created
2017-06-14
348 commits to master branch, last one 3 years ago
An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
Created
2021-03-11
309 commits to main branch, last one about a month ago
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Created
2019-12-10
479 commits to master branch, last one about a year ago
A simple Spark-powered ETL framework that just works 🍺
Created
2019-12-20
627 commits to master branch, last one 2 years ago
A curated list of Clojure resources for dealing with domain-specific languages.
Created
2021-01-03
53 commits to master branch, last one 3 months ago
Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more
Created
2021-03-19
117 commits to main branch, last one 4 months ago
Data transformation and utility functions for R
Created
2015-03-21
952 commits to master branch, last one 6 months ago
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Created
2014-12-04
919 commits to master branch, last one 10 days ago
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Created
2019-05-26
3,418 commits to master branch, last one about a year ago
A visual data pipeline builder with various backends
Created
2019-01-23
2,889 commits to master branch, last one a day ago
Wrangler Transform: A DMD system for transforming Big Data
Created
2016-11-27
1,521 commits to develop branch, last one 3 days ago
A schema-aware Scala library for data transformation
Created
2021-02-05
544 commits to master branch, last one 9 months ago
Data transformation toolkit
Created
2019-08-18
834 commits to main branch, last one 10 months ago
Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
Created
2023-08-03
1,209 commits to main branch, last one 15 hours ago
breadroll 🥟 is a simple lightweight library for data processing operations written in Typescript and powered by Bun.
Created
2023-06-02
195 commits to main branch, last one 3 months ago
Examples for working with DataWeave scripts from Apex.
This repository has been archived
(exclude archived)
Created
2021-11-18
71 commits to main branch, last one 9 months ago
Daany - .NET DAta ANalYtics .NET library with the implementation of DataFrame, Time series decompositions and Linear Algebra routines BLASS and LAPACK.
Created
2019-09-22
395 commits to master branch, last one about a year ago
object flow treatment, data transformation
Created
2016-05-15
1,736 commits to master branch, last one 3 months ago
⚡️ Next-generation data transformation framework for TypeScript that puts developer experience first
This repository has been archived
(exclude archived)
Created
2022-03-23
66 commits to main branch, last one 2 years ago