30 results found Sort:

3.3k
13.0k
apache-2.0
286
Apache Doris is an easy-to-use, high performance and unified analytics database.
Created 2017-08-10
23,860 commits to master branch, last one 8 hours ago
3.1k
10.7k
apache-2.0
180
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Created 2019-01-19
40,718 commits to master branch, last one a day ago
1.9k
9.5k
apache-2.0
212
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for...
Created 2021-09-04
19,442 commits to main branch, last one 14 hours ago
1.7k
7.7k
apache-2.0
218
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Created 2019-04-22
3,638 commits to master branch, last one a day ago
184
3.2k
apache-2.0
43
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Created 2020-12-11
280 commits to main branch, last one 20 hours ago
421
2.4k
apache-2.0
36
A native Rust library for Delta Lake, with bindings into Python
Created 2020-04-26
1,856 commits to main branch, last one 17 hours ago
743
1.2k
apache-2.0
41
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Created 2019-02-10
184 commits to master branch, last one 3 years ago
151
955
apache-2.0
27
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Created 2023-07-21
278 commits to main branch, last one 4 days ago
173
787
apache-2.0
28
An open protocol for secure data sharing
Created 2021-04-08
402 commits to main branch, last one 5 days ago
28
613
apache-2.0
14
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Created 2024-05-06
95 commits to main branch, last one 15 days ago
14
455
apache-2.0
11
Analytical database for data-driven Web applications 🪶
Created 2022-07-04
1,447 commits to main branch, last one a day ago
Iceberg/Delta Columnstore Table in Postgres
Created 2024-09-05
63 commits to main branch, last one 2 days ago
39
229
apache-2.0
18
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...
Created 2022-11-11
17 commits to master branch, last one 2 months ago
The Internals of Delta Lake
Created 2019-10-30
669 commits to main branch, last one 3 months ago
Sample project to demonstrate data engineering best practices
Created 2023-08-04
16 commits to main branch, last one 10 months ago
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Created 2021-06-27
18 commits to hudi branch, last one 2 years ago
A Minimalistic Rust Implementation of Delta Sharing Server.
Created 2023-03-13
269 commits to main branch, last one about a month ago
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
Created 2021-05-27
23 commits to main branch, last one 2 years ago
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Created 2020-02-17
39 commits to master branch, last one 4 years ago
9
66
apache-2.0
2
Lakehouse storage system benchmark
Created 2022-12-15
42 commits to main branch, last one about a year ago
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
Created 2021-07-26
56 commits to master branch, last one 3 years ago
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
Created 2022-02-02
4 commits to main branch, last one 2 years ago
A Delta Lake reader for Dask
Created 2021-09-13
79 commits to main branch, last one 3 months ago
14
47
apache-2.0
8
Read Delta tables without any Spark
Created 2020-12-23
54 commits to main branch, last one about a year ago
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
Created 2022-05-13
10 commits to master branch, last one about a year ago
Native Delta Lake Implementation in Go
Created 2023-03-23
39 commits to master branch, last one about a year ago
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-...
Created 2021-04-12
165 commits to master branch, last one 2 years ago
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
Created 2021-03-05
48 commits to main branch, last one 3 years ago
0
29
apache-2.0
3
PawMark is a platform for developers to build, schedule and monitor data pipelines.
Created 2023-10-04
387 commits to release-0.6.1 branch, last one 25 days ago