81 results found Sort:

202
8.6k
other
68
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Created 2015-05-03
8,836 commits to main branch, last one 20 hours ago
A collection of handy Bash One-Liners and terminal tricks for data processing and Linux system maintenance.
Created 2016-06-14
515 commits to master branch, last one 9 months ago
606
4.9k
apache-2.0
94
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Created 2018-06-01
3,572 commits to main branch, last one 21 hours ago
112
4.9k
mit
31
Select, put and delete data from JSON, TOML, YAML, XML and CSV files with a single tool. Supports conversion between formats and can be used as a Go package.
Created 2020-09-22
697 commits to master branch, last one 15 days ago
283
3.1k
mit
18
A light-weight, flexible, and expressive statistical data testing library
Created 2018-11-01
722 commits to main branch, last one a day ago
371
2.4k
apache-2.0
78
Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/
Created 2017-07-22
1,719 commits to master branch, last one 3 years ago
342
2.3k
mit
55
Large-scale pretraining for dialogue
Created 2019-08-29
83 commits to master branch, last one about a year ago
153
2.3k
apache-2.0
48
Concurrent and multi-stage data ingestion and data processing with Elixir
Created 2018-11-05
392 commits to main branch, last one 8 days ago
87
2.3k
other
20
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
Created 2022-11-27
414 commits to main branch, last one a day ago
143
1.6k
apache-2.0
58
Extract Transform Load for Python 3.5+
Created 2016-12-09
981 commits to develop branch, last one 3 years ago
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Created 2017-02-10
547 commits to main branch, last one about a year ago
56
1.3k
apache-2.0
14
Python Stream Processing
Created 2022-02-04
2,345 commits to main branch, last one a day ago
98
940
apache-2.0
17
Kubernetes-native platform to run massively parallel data/streaming jobs
Created 2022-05-20
914 commits to main branch, last one 16 hours ago
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
This repository has been archived (exclude archived)
Created 2014-12-03
2,332 commits to master branch, last one 5 years ago
110
841
mit
20
Large-scale pretrained models for goal-directed dialog
Created 2022-05-10
43 commits to main branch, last one about a year ago
79
811
apache-2.0
17
Data and tools for generating and inspecting OLMo pre-training data.
Created 2023-06-20
313 commits to main branch, last one 10 days ago
118
744
apache-2.0
24
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Created 2019-03-08
496 commits to master branch, last one 2 years ago
56
693
bsd-3-clause
23
HStreamDB is an open-source, cloud-native streaming database for IoT and beyond. Modernize your data stack for real-time applications.
Created 2020-08-31
1,718 commits to main branch, last one 23 hours ago
38
655
gpl-3.0
26
Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON docu...
Created 2015-06-11
753 commits to master branch, last one about a month ago
160
566
unknown
31
A list about Apache Kafka
Created 2016-04-29
82 commits to master branch, last one 3 months ago
66
518
apache-2.0
4
All-in-one text de-duplication
Created 2021-03-13
363 commits to main branch, last one 10 days ago
31
512
mit
16
👾~ music, eternal ~ 👾
Created 2019-01-17
346 commits to master branch, last one about a year ago
26
470
apache-2.0
19
Harmonious distributed data analysis in Rust.
Created 2018-10-17
550 commits to master branch, last one 3 years ago
213
452
agpl-3.0
37
Lineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxon
Created 2016-05-15
448 commits to master branch, last one 3 days ago
Deal with bad samples in your dataset dynamically, use Transforms as Filters, and more!
Created 2018-10-05
30 commits to master branch, last one 2 years ago
21
340
mit
19
PHP - ETL (Extract Transform Load) data processing library
Created 2020-10-26
853 commits to 1.x branch, last one 6 days ago
25
322
apache-2.0
6
Production-ready data processing made easy and shareable
Created 2023-03-02
625 commits to main branch, last one 3 days ago
97
307
apache-2.0
14
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Created 2018-04-23
4,279 commits to master branch, last one a day ago