29 results found Sort:
- Filter by Primary Language:
- Python (9)
- Scala (5)
- Java (4)
- Rust (4)
- Jupyter Notebook (1)
- JavaScript (1)
- Go (1)
- HTML (1)
- Dockerfile (1)
- +
Apache Doris is an easy-to-use, high performance and unified analytics database.
Created
2017-08-10
19,845 commits to master branch, last one 16 hours ago
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Created
2019-01-19
38,206 commits to master branch, last one 8 hours ago
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
Created
2021-09-04
17,264 commits to main branch, last one 14 hours ago
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Created
2019-04-22
3,216 commits to master branch, last one 8 hours ago
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
Created
2020-12-11
258 commits to main branch, last one 17 days ago
A native Rust library for Delta Lake, with bindings into Python
Created
2020-04-26
1,487 commits to main branch, last one 16 hours ago
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Created
2019-02-10
184 commits to master branch, last one 2 years ago
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Created
2023-07-21
219 commits to main branch, last one 2 days ago
An open protocol for secure data sharing
Created
2021-04-08
337 commits to main branch, last one 2 days ago
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Created
2024-05-06
37 commits to main branch, last one 2 days ago
Analytical database for data-driven Web applications 🪶
Created
2022-07-04
1,139 commits to main branch, last one 2 days ago
Amazon SageMaker Local Mode Examples
Created
2020-11-05
377 commits to main branch, last one 22 days ago
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...
Created
2022-11-11
11 commits to master branch, last one 23 days ago
The Internals of Delta Lake
Created
2019-10-30
659 commits to main branch, last one a day ago
Sample project to demonstrate data engineering best practices
Created
2023-08-04
16 commits to main branch, last one 3 months ago
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Created
2021-06-27
18 commits to hudi branch, last one 2 years ago
A Minimalistic Rust Implementation of Delta Sharing Server.
Created
2023-03-13
258 commits to main branch, last one about a month ago
Streaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Created
2020-02-17
39 commits to master branch, last one 3 years ago
Lakehouse storage system benchmark
Created
2022-12-15
42 commits to main branch, last one about a year ago
Exercícios do módulo 1 - Bootcamp EDC - IGTI 2021
Created
2021-07-26
56 commits to master branch, last one 2 years ago
Read Delta tables without any Spark
Created
2020-12-23
54 commits to main branch, last one 6 months ago
Jupyter notebooks and AWS CloudFormation template to show how Hudi, Iceberg, and Delta Lake work
Created
2022-02-02
4 commits to main branch, last one 2 years ago
A Delta Lake reader for Dask
Created
2021-09-13
75 commits to main branch, last one 2 months ago
This repository exemplifies a simple ELT process using delta to perform upsert and remove data files that aren't in the latest state of the transaction log for the table.
Created
2021-05-27
23 commits to main branch, last one 2 years ago
Books and Papers in Mathematics, Econometrics, Machine Learning, Finance etc for different levels that can be useful for Data Scientists, Developers and everyone whoo is interesting in STEM.
Created
2021-03-05
48 commits to main branch, last one 3 years ago
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-...
Created
2021-04-12
165 commits to master branch, last one about a year ago
Native Delta Lake Implementation in Go
Created
2023-03-23
39 commits to master branch, last one 7 months ago
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
Created
2022-05-13
10 commits to master branch, last one 6 months ago
PawMark is a platform for developers to build, schedule and monitor data pipelines.
Created
2023-10-04
342 commits to main branch, last one 19 hours ago