29 results found Sort:

3.1k
11.7k
apache-2.0
281
Apache Doris is an easy-to-use, high performance and unified analytics database.
Created 2017-08-10
19,845 commits to master branch, last one 16 hours ago
2.8k
9.7k
apache-2.0
167
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Created 2019-01-19
38,206 commits to master branch, last one 8 hours ago
1.7k
8.1k
apache-2.0
208
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
Created 2021-09-04
17,264 commits to main branch, last one 14 hours ago
1.6k
7.1k
apache-2.0
217
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Created 2019-04-22
3,216 commits to master branch, last one 8 hours ago
172
3.1k
apache-2.0
43
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Created 2020-12-11
258 commits to main branch, last one 17 days ago
362
1.9k
apache-2.0
39
A native Rust library for Delta Lake, with bindings into Python
Created 2020-04-26
1,487 commits to main branch, last one 16 hours ago
693
1.1k
apache-2.0
40
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Created 2019-02-10
184 commits to master branch, last one 2 years ago
111
738
apache-2.0
25
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Created 2023-07-21
219 commits to main branch, last one 2 days ago
151
703
apache-2.0
28
An open protocol for secure data sharing
Created 2021-04-08
337 commits to main branch, last one 2 days ago
14
544
apache-2.0
15
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Created 2024-05-06
37 commits to main branch, last one 2 days ago
7
371
apache-2.0
8
Analytical database for data-driven Web applications 🪶
Created 2022-07-04
1,139 commits to main branch, last one 2 days ago
36
192
apache-2.0
18
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...
Created 2022-11-11
11 commits to master branch, last one 23 days ago
The Internals of Delta Lake
Created 2019-10-30
659 commits to main branch, last one a day ago
Sample project to demonstrate data engineering best practices
Created 2023-08-04
16 commits to main branch, last one 3 months ago
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Created 2021-06-27
18 commits to hudi branch, last one 2 years ago
A Minimalistic Rust Implementation of Delta Sharing Server.
Created 2023-03-13
258 commits to main branch, last one about a month ago
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Created 2020-02-17
39 commits to master branch, last one 3 years ago
9
58
apache-2.0
2
Lakehouse storage system benchmark
Created 2022-12-15
42 commits to main branch, last one about a year ago
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
Created 2021-07-26
56 commits to master branch, last one 2 years ago
16
46
apache-2.0
8
Read Delta tables without any Spark
Created 2020-12-23
54 commits to main branch, last one 6 months ago
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
Created 2022-02-02
4 commits to main branch, last one 2 years ago
A Delta Lake reader for Dask
Created 2021-09-13
75 commits to main branch, last one 2 months ago
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
Created 2021-05-27
23 commits to main branch, last one 2 years ago
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
Created 2021-03-05
48 commits to main branch, last one 3 years ago
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-...
Created 2021-04-12
165 commits to master branch, last one about a year ago
Native Delta Lake Implementation in Go
Created 2023-03-23
39 commits to master branch, last one 7 months ago
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
Created 2022-05-13
10 commits to master branch, last one 6 months ago
0
29
apache-2.0
1
PawMark is a platform for developers to build, schedule and monitor data pipelines.
Created 2023-10-04
342 commits to main branch, last one 19 hours ago