107 results found Sort:

3.7k
15.2k
apache-2.0
348
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
Created 2016-02-17
17,364 commits to main branch, last one 22 hours ago
164
3.8k
other
28
Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.
Created 2022-01-10
102 commits to main branch, last one about a year ago
188
3.3k
apache-2.0
42
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Created 2020-12-11
290 commits to main branch, last one 7 days ago
901
2.9k
apache-2.0
50
Official Rust implementation of Apache Arrow
Created 2021-04-17
6,404 commits to main branch, last one 15 hours ago
1.5k
2.8k
apache-2.0
90
Apache Parquet Java
Created 2014-06-10
2,761 commits to master branch, last one 6 hours ago
78
2.7k
unlicense
17
Blazing-fast Data-Wrangling toolkit
Created 2020-12-11
12,091 commits to master branch, last one 13 hours ago
129
2.0k
apache-2.0
12
Rill is a tool for effortlessly transforming data sets into powerful, opinionated dashboards using SQL. BI-as-code.
Created 2021-12-09
4,593 commits to main branch, last one 7 hours ago
976
2.0k
apache-2.0
149
Apache Drill is a distributed MPP query layer for self describing data
Created 2012-09-05
4,530 commits to master branch, last one 17 days ago
439
1.9k
apache-2.0
69
Apache Parquet Format
Created 2014-06-10
405 commits to master branch, last one 6 days ago
281
1.8k
apache-2.0
37
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, a...
Created 2018-06-15
691 commits to master branch, last one about a year ago
358
1.8k
apache-2.0
137
A large-scale entity and relation database supporting aggregation of properties
Created 2015-12-14
7,331 commits to develop branch, last one about a month ago
140
1.3k
apache-2.0
10
cryo is the easiest way to extract blockchain data to parquet, csv, json, or python dataframes
Created 2023-06-27
443 commits to main branch, last one 2 months ago
91
1.3k
apache-2.0
17
Quilt is a data mesh for connecting people with actionable data
Created 2017-02-10
4,969 commits to master branch, last one 2 days ago
32
1.3k
agpl-3.0
7
Single-binary Postgres read replica optimized for analytics
Created 2024-11-04
321 commits to main branch, last one 2 days ago
Postgres-Native Data Warehouse
Created 2024-09-05
97 commits to main branch, last one 13 days ago
311
1.0k
apache-2.0
96
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
Created 2013-11-19
2,036 commits to master branch, last one 28 days ago
54
959
apache-2.0
16
A portable embedded database using Arrow.
Created 2024-07-15
357 commits to main branch, last one a day ago
104
851
gpl-3.0
11
Simple Windows desktop application for viewing & querying Apache Parquet files
Created 2018-05-31
378 commits to main branch, last one about a month ago
139
821
mit
50
ETL framework for .NET (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Created 2016-10-19
1,025 commits to master branch, last one 4 months ago
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML...
Created 2015-10-27
5,433 commits to master branch, last one 16 days ago
A Python library for fast, interactive geospatial vector data visualization in Jupyter.
Created 2023-08-31
418 commits to main branch, last one 2 days ago
27
641
other
6
Query anything (CSV, GitHub, etc.) with SQL and let LLMs (ChatGPT, Claude) connect to these apps
Created 2024-04-06
343 commits to main branch, last one 7 days ago
67
620
mit
22
Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython,...
Created 2020-10-25
724 commits to main branch, last one 10 months ago
61
598
apache-2.0
8
Fastest open-source tool for replicating Databases to Apache Iceberg or Data Lakehouse. ⚡ Efficient, quick and scalable data ingestion for real-time analytics. Supporting Postgres, MongoDB and MySQL
Created 2024-10-15
211 commits to master branch, last one 2 days ago
101
576
apache-2.0
37
Fast data store for Pandas time-series data
Created 2018-05-26
210 commits to main branch, last one 8 months ago
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Created 2018-12-12
696 commits to master branch, last one about a year ago
20
564
apache-2.0
6
Rust-based WebAssembly bindings to read and write Apache Parquet data
Created 2022-02-27
352 commits to main branch, last one 2 months ago
21
522
postgresql
5
DuckDB-powered data lake analytics from Postgres
This repository has been archived (exclude archived)
Created 2024-05-09
128 commits to dev branch, last one 11 days ago
59
478
apache-2.0
348
Iceberg is a table format for large, slow-moving tabular data
Created 2017-12-13
278 commits to master branch, last one 6 years ago
Copy to/from Parquet in S3 or Azure Blob Storage from within PostgreSQL
Created 2024-09-04
65 commits to main branch, last one 16 days ago