88 results found Sort:
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Created
2016-06-14
520 commits to master branch, last one 2 months ago
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Created
2015-05-03
8,937 commits to main branch, last one 10 days ago
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
Created
2020-09-22
710 commits to master branch, last one 29 days ago
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Created
2018-06-01
3,718 commits to main branch, last one 16 hours ago
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created
2022-11-27
823 commits to main branch, last one 10 hours ago
A light-weight, flexible, and expressive statistical data testing library
Created
2018-11-01
777 commits to main branch, last one 5 days ago
Concurrent and multi-stage data ingestion and data processing with Elixir
Created
2018-11-05
396 commits to main branch, last one 3 months ago
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Created
2017-07-22
1,719 commits to master branch, last one 4 years ago
Large-scale pretraining for dialogue
Created
2019-08-29
83 commits to master branch, last one 2 years ago
Extract Transform Load for Python 3.5+
Created
2016-12-09
981 commits to develop branch, last one 3 years ago
Python Stream Processing
Created
2022-02-04
2,528 commits to main branch, last one 8 days ago
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Created
2017-02-10
547 commits to main branch, last one about a year ago
Kubernetes-native platform to run massively parallel data/streaming jobs
Created
2022-05-20
1,138 commits to main branch, last one 2 days ago
Data and tools for generating and inspecting OLMo pre-training data.
Created
2023-06-20
349 commits to main branch, last one 16 days ago
A tool that uses advanced Monte Carlo simulations and Turbit parallel processing to create possible Bitcoin prediction scenarios.
Created
2024-08-02
6 commits to main branch, last one 3 months ago
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
This repository has been archived
(exclude archived)
Created
2014-12-03
2,332 commits to master branch, last one 6 years ago
Large-scale pretrained models for goal-directed dialog
Created
2022-05-10
43 commits to main branch, last one about a year ago
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Created
2019-03-08
496 commits to master branch, last one 2 years ago
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
Created
2020-08-31
1,725 commits to main branch, last one about a month ago
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON docu...
Created
2015-06-11
753 commits to master branch, last one 6 months ago
Advanced and Fast Data Transformation in R
Created
2019-02-27
3,062 commits to master branch, last one 4 days ago
All-in-one text de-duplication
Created
2021-03-13
363 commits to main branch, last one 5 months ago
A list about Apache Kafka
Created
2016-04-29
82 commits to master branch, last one 9 months ago
Scalable data pre processing and curation toolkit for LLMs
Created
2024-03-14
182 commits to main branch, last one 8 hours ago
👾~ music, eternal ~ 👾
Created
2019-01-17
346 commits to master branch, last one about a year ago
Machine Learning notebooks for refreshing concepts.
python
deep-learning
data-processing
neural-networks
machine-learning
model-evaluation
regression-models
clustering-methods
classification-trees
data-science-notebook
deep-learning-tutorial
reinforcement-learning
python-machine-learning
deep-learning-algorithms
dimensionality-reduction
machine-learning-tutorials
machine-learning-algorithms
natural-language-processing
Created
2017-10-30
117 commits to master branch, last one 6 years ago
Harmonious distributed data analysis in Rust.
Created
2018-10-17
550 commits to master branch, last one 3 years ago
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Created
2016-05-15
484 commits to master branch, last one a day ago
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Created
2018-10-05
30 commits to master branch, last one 3 years ago
PHP - ETL (Extract Transform Load) data processing library
Created
2020-10-26
908 commits to 1.x branch, last one 12 days ago