30 results found Sort:

3.3k
12.8k
apache-2.0
285
Apache Doris is an easy-to-use, high performance and unified analytics database.
Created 2017-08-10
23,328 commits to master branch, last one 3 hours ago
3.0k
10.5k
apache-2.0
179
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Created 2019-01-19
40,430 commits to master branch, last one 5 hours ago
1.8k
9.3k
apache-2.0
209
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for...
Created 2021-09-04
19,113 commits to main branch, last one 4 hours ago
1.7k
7.6k
apache-2.0
217
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Created 2019-04-22
3,586 commits to master branch, last one 7 hours ago
180
3.2k
apache-2.0
43
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Created 2020-12-11
268 commits to main branch, last one a day ago
415
2.4k
apache-2.0
37
A native Rust library for Delta Lake, with bindings into Python
Created 2020-04-26
1,810 commits to main branch, last one 2 days ago
733
1.2k
apache-2.0
41
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Created 2019-02-10
184 commits to master branch, last one 2 years ago
149
927
apache-2.0
28
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Created 2023-07-21
270 commits to main branch, last one 5 days ago
172
775
apache-2.0
29
An open protocol for secure data sharing
Created 2021-04-08
392 commits to main branch, last one 5 days ago
29
607
apache-2.0
14
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Created 2024-05-06
92 commits to main branch, last one 4 days ago
13
439
apache-2.0
10
Analytical database for data-driven Web applications 🪶
Created 2022-07-04
1,426 commits to main branch, last one 4 days ago
Iceberg/Delta Columnstore Table in Postgres
Created 2024-09-05
40 commits to main branch, last one 6 days ago
38
225
apache-2.0
18
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...
Created 2022-11-11
17 commits to master branch, last one about a month ago
The Internals of Delta Lake
Created 2019-10-30
669 commits to main branch, last one 2 months ago
Sample project to demonstrate data engineering best practices
Created 2023-08-04
16 commits to main branch, last one 9 months ago
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Created 2021-06-27
18 commits to hudi branch, last one 2 years ago
A Minimalistic Rust Implementation of Delta Sharing Server.
Created 2023-03-13
269 commits to main branch, last one 2 days ago
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
Created 2021-05-27
23 commits to main branch, last one 2 years ago
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Created 2020-02-17
39 commits to master branch, last one 4 years ago
9
66
apache-2.0
2
Lakehouse storage system benchmark
Created 2022-12-15
42 commits to main branch, last one about a year ago
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
Created 2021-07-26
56 commits to master branch, last one 3 years ago
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
Created 2022-02-02
4 commits to main branch, last one 2 years ago
A Delta Lake reader for Dask
Created 2021-09-13
79 commits to main branch, last one 2 months ago
14
47
apache-2.0
8
Read Delta tables without any Spark
Created 2020-12-23
54 commits to main branch, last one about a year ago
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
Created 2022-05-13
10 commits to master branch, last one about a year ago
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-...
Created 2021-04-12
165 commits to master branch, last one 2 years ago
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
Created 2021-03-05
48 commits to main branch, last one 3 years ago
Native Delta Lake Implementation in Go
Created 2023-03-23
39 commits to master branch, last one about a year ago
0
29
apache-2.0
3
PawMark is a platform for developers to build, schedule and monitor data pipelines.
Created 2023-10-04
382 commits to release-0.6.1 branch, last one 2 months ago