81 results found Sort:
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Created
2015-05-03
8,836 commits to main branch, last one 20 hours ago
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Created
2016-06-14
515 commits to master branch, last one 9 months ago
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Created
2018-06-01
3,572 commits to main branch, last one 21 hours ago
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
Created
2020-09-22
697 commits to master branch, last one 15 days ago
A light-weight, flexible, and expressive statistical data testing library
Created
2018-11-01
722 commits to main branch, last one a day ago
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Created
2017-07-22
1,719 commits to master branch, last one 3 years ago
Large-scale pretraining for dialogue
Created
2019-08-29
83 commits to master branch, last one about a year ago
Concurrent and multi-stage data ingestion and data processing with Elixir
Created
2018-11-05
392 commits to main branch, last one 8 days ago
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created
2022-11-27
414 commits to main branch, last one a day ago
Extract Transform Load for Python 3.5+
Created
2016-12-09
981 commits to develop branch, last one 3 years ago
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Created
2017-02-10
547 commits to main branch, last one about a year ago
Python Stream Processing
Created
2022-02-04
2,345 commits to main branch, last one a day ago
Kubernetes-native platform to run massively parallel data/streaming jobs
Created
2022-05-20
914 commits to main branch, last one 16 hours ago
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
This repository has been archived
(exclude archived)
Created
2014-12-03
2,332 commits to master branch, last one 5 years ago
Large-scale pretrained models for goal-directed dialog
Created
2022-05-10
43 commits to main branch, last one about a year ago
Data and tools for generating and inspecting OLMo pre-training data.
Created
2023-06-20
313 commits to main branch, last one 10 days ago
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Created
2019-03-08
496 commits to master branch, last one 2 years ago
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
Created
2020-08-31
1,718 commits to main branch, last one 23 hours ago
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON docu...
Created
2015-06-11
753 commits to master branch, last one about a month ago
Advanced and Fast Data Transformation in R
Created
2019-02-27
2,903 commits to master branch, last one a day ago
A list about Apache Kafka
Created
2016-04-29
82 commits to master branch, last one 3 months ago
All-in-one text de-duplication
Created
2021-03-13
363 commits to main branch, last one 10 days ago
👾~ music, eternal ~ 👾
Created
2019-01-17
346 commits to master branch, last one about a year ago
Harmonious distributed data analysis in Rust.
Created
2018-10-17
550 commits to master branch, last one 3 years ago
Machine Learning notebooks for refreshing concepts.
python
deep-learning
data-processing
neural-networks
machine-learning
model-evaluation
regression-models
clustering-methods
classification-trees
data-science-notebook
deep-learning-tutorial
reinforcement-learning
python-machine-learning
deep-learning-algorithms
dimensionality-reduction
machine-learning-tutorials
machine-learning-algorithms
natural-language-processing
Created
2017-10-30
117 commits to master branch, last one 5 years ago
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Created
2016-05-15
448 commits to master branch, last one 3 days ago
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Created
2018-10-05
30 commits to master branch, last one 2 years ago
PHP - ETL (Extract Transform Load) data processing library
Created
2020-10-26
853 commits to 1.x branch, last one 6 days ago
Production-ready data processing made easy and shareable
Created
2023-03-02
625 commits to main branch, last one 3 days ago
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Created
2018-04-23
4,279 commits to master branch, last one a day ago