20 results found Sort:

5.3k
15.7k
apache-2.0
862
The official home of the Presto distributed SQL query engine for big data
Created 2012-08-09
22,862 commits to master branch, last one 11 hours ago
3.1k
11.6k
apache-2.0
283
Apache Doris is an easy-to-use, high performance and unified analytics database.
Created 2017-08-10
19,618 commits to master branch, last one 14 hours ago
1.6k
8.1k
apache-2.0
208
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. ...
Created 2021-09-04
17,161 commits to main branch, last one 18 hours ago
419
2.3k
apache-2.0
245
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Created 2021-12-28
1,084 commits to main branch, last one 4 days ago
120
1.8k
apache-2.0
30
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Created 2022-12-05
72,444 commits to main branch, last one 11 hours ago
282
1.6k
apache-2.0
36
ByConity is an open source cloud data warehouse
Created 2022-12-22
71,616 commits to master branch, last one 18 days ago
251
721
apache-2.0
36
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Created 2022-07-14
1,267 commits to master branch, last one a day ago
165
398
apache-2.0
20
World's most powerful data catalog service with providing a high-performance, geo-distributed and federated metadata lake.
Created 2023-04-23
1,292 commits to main branch, last one 21 hours ago
28
283
apache-2.0
12
Use SQL to build ELT pipelines on a data lakehouse.
Created 2021-03-11
481 commits to main branch, last one 2 years ago
52
226
other
20
AI 时代的智能数据库
Created 2019-07-16
221 commits to master branch, last one 6 months ago
76
214
apache-2.0
11
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Created 2022-03-08
872 commits to main branch, last one 8 days ago
35
188
apache-2.0
18
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...
Created 2022-11-11
11 commits to master branch, last one 11 days ago
Examples of using Terraform to deploy Databricks resources
Created 2022-06-10
165 commits to main branch, last one 21 hours ago
17
163
apache-2.0
13
Pure Rust Iceberg Implementation
Created 2023-06-15
182 commits to main branch, last one 25 days ago
7
137
apache-2.0
9
Unified storage framework for the entire machine learning lifecycle
Created 2023-12-15
99 commits to main branch, last one 3 months ago
A curated list of open source tools used in analytical stacks and data engineering ecosystem
Created 2024-02-22
8 commits to main branch, last one 3 months ago
9
58
apache-2.0
2
Lakehouse storage system benchmark
Created 2022-12-15
42 commits to main branch, last one about a year ago
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
Created 2023-04-04
15 commits to main branch, last one 9 months ago
DeltaOMS is a solution that help build a centralized repository of Delta Transaction logs and associated operational metrics/statistics for your Delta Lakehouse. Unity Catalog supported in the v0.7.0-...
Created 2021-04-12
165 commits to master branch, last one about a year ago
Creation of a data lakehouse and an ELT pipeline to enable the efficient analysis and use of data
Created 2022-05-13
10 commits to master branch, last one 6 months ago