6 results found Sort:

589
8.2k
mit
143
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Created 2014-09-27
3,636 commits to master branch, last one 11 months ago
548
4.5k
apache-2.0
82
the portable Python dataframe library
Created 2015-04-17
8,288 commits to main branch, last one 10 hours ago
281
1.8k
apache-2.0
41
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, a...
Created 2018-06-15
691 commits to master branch, last one 7 months ago
Work with bioinformatic files using Arrow, Polars, and/or DuckDB
Created 2023-04-22
282 commits to main branch, last one 7 days ago
Command-line interface to quickly generate fake CSV and JSON data
Created 2023-05-25
37 commits to main branch, last one 16 days ago
Exploring Chicago crimes dataset with Jupyter notebooks, DuckDB, Malloy and new Panel/PyScript data and dashboard tools.
Created 2022-10-09
243 commits to main branch, last one about a year ago