30 results found Sort:
- Filter by Primary Language:
- Python (9)
- Scala (5)
- Java (4)
- Rust (4)
- C++ (1)
- Jupyter Notebook (1)
- JavaScript (1)
- Dockerfile (1)
- Go (1)
- HTML (1)
- +
Apache Doris is an easy-to-use, high performance and unified analytics database.
Created
2017-08-10
23,328 commits to master branch, last one 3 hours ago
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Created
2019-01-19
40,430 commits to master branch, last one 5 hours ago
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for...
Created
2021-09-04
19,113 commits to main branch, last one 4 hours ago
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Created
2019-04-22
3,586 commits to master branch, last one 7 hours ago
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Created
2020-12-11
268 commits to main branch, last one a day ago
A native Rust library for Delta Lake, with bindings into Python
Created
2020-04-26
1,810 commits to main branch, last one 2 days ago
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Created
2019-02-10
184 commits to master branch, last one 2 years ago
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Created
2023-07-21
270 commits to main branch, last one 5 days ago
An open protocol for secure data sharing
Created
2021-04-08
392 commits to main branch, last one 5 days ago
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Created
2024-05-06
92 commits to main branch, last one 4 days ago
Analytical database for data-driven Web applications 🪶
Created
2022-07-04
1,426 commits to main branch, last one 4 days ago
Amazon SageMaker Local Mode Examples
Created
2020-11-05
382 commits to main branch, last one 4 months ago
Iceberg/Delta Columnstore Table in Postgres
Created
2024-09-05
40 commits to main branch, last one 6 days ago
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...
Created
2022-11-11
17 commits to master branch, last one about a month ago
The Internals of Delta Lake
Created
2019-10-30
669 commits to main branch, last one 2 months ago
Sample project to demonstrate data engineering best practices
Created
2023-08-04
16 commits to main branch, last one 9 months ago
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Created
2021-06-27
18 commits to hudi branch, last one 2 years ago
A Minimalistic Rust Implementation of Delta Sharing Server.
Created
2023-03-13
269 commits to main branch, last one 2 days ago
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
Created
2021-05-27
23 commits to main branch, last one 2 years ago
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Created
2020-02-17
39 commits to master branch, last one 4 years ago
Lakehouse storage system benchmark
Created
2022-12-15
42 commits to main branch, last one about a year ago
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
Created
2021-07-26
56 commits to master branch, last one 3 years ago
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
Created
2022-02-02
4 commits to main branch, last one 2 years ago
A Delta Lake reader for Dask
Created
2021-09-13
79 commits to main branch, last one 2 months ago
Read Delta tables without any Spark
Created
2020-12-23
54 commits to main branch, last one about a year ago
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
Created
2022-05-13
10 commits to master branch, last one about a year ago
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-...
Created
2021-04-12
165 commits to master branch, last one 2 years ago
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
Created
2021-03-05
48 commits to main branch, last one 3 years ago
Native Delta Lake Implementation in Go
Created
2023-03-23
39 commits to master branch, last one about a year ago
PawMark is a platform for developers to build, schedule and monitor data pipelines.
Created
2023-10-04
382 commits to release-0.6.1 branch, last one 2 months ago