10 results found Sort:

285
1.8k
apache-2.0
40
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, a...
Created 2018-06-15
691 commits to master branch, last one 10 months ago
134
796
mit
50
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Created 2016-10-19
1,018 commits to master branch, last one 15 days ago
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Created 2018-08-26
484 commits to master branch, last one about a month ago
13
126
other
14
:guardsman: Tools to Transform and Query Data with 'Apache' 'Drill'
Created 2016-06-03
152 commits to master branch, last one 2 years ago
7
123
apache-2.0
4
Query and transform data with PRQL
This repository has been archived (exclude archived)
Created 2022-10-11
140 commits to main branch, last one about a year ago
A converter for the OSM PBFs to Parquet files
Created 2016-04-03
34 commits to master branch, last one 4 years ago
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/...
Created 2023-01-29
70 commits to master branch, last one 26 days ago
14
85
apache-2.0
30
MongoDB integrations for Apache Arrow. Export MongoDB documents to numpy array, parquet files, and pandas dataframes in one line of code.
Created 2021-01-19
220 commits to main branch, last one 5 days ago
A lightweight Java library that facilitates reading and writing Apache Parquet files without Hadoop dependencies
Created 2020-09-30
131 commits to master branch, last one about a month ago