35 results found Sort:

1.4k
13.9k
other
113
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
Created 2023-04-22
1,086 commits to main branch, last one 11 days ago
3.1k
10.6k
apache-2.0
180
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Created 2019-01-19
40,649 commits to master branch, last one 2 days ago
1.9k
9.5k
apache-2.0
211
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for...
Created 2021-09-04
19,392 commits to main branch, last one 5 hours ago
630
8.3k
apache-2.0
91
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....
Created 2019-08-09
9,202 commits to main branch, last one 10 days ago
2.4k
5.5k
apache-2.0
1.2k
Upserts, Deletes And Incremental Processing on Big Data.
Created 2016-12-14
5,961 commits to master branch, last one 2 days ago
360
4.5k
apache-2.0
44
lakeFS - Data version control for your data lake | Git for data
Created 2019-09-12
5,501 commits to master branch, last one a day ago
1.2k
3.3k
apache-2.0
39
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
Created 2021-06-09
2,663 commits to dev branch, last one 11 hours ago
476
2.6k
apache-2.0
291
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Created 2021-12-28
1,157 commits to main branch, last one about a month ago
155
1.6k
apache-2.0
85
The LeoFS Storage System
Created 2012-06-06
1,664 commits to v1 branch, last one 4 years ago
380
1.2k
apache-2.0
30
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Created 2023-04-23
2,099 commits to main branch, last one 2 days ago
121
968
agpl-3.0
16
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Created 2021-08-25
2,354 commits to main branch, last one a day ago
305
895
apache-2.0
37
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Created 2022-07-14
1,509 commits to master branch, last one 7 days ago
158
541
unknown
39
汇总Apache Hudi相关资料
Created 2019-12-11
262 commits to master branch, last one a day ago
132
521
apache-2.0
27
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Created 2019-09-27
4,037 commits to master branch, last one 6 months ago
16
418
postgresql
2
DuckDB-powered analytics for Postgres
Created 2024-05-09
88 commits to dev branch, last one 11 days ago
52
320
bsd-2-clause
15
Open Control Plane for Tables in Data Lakehouse
Created 2024-02-13
361 commits to main branch, last one 9 days ago
28
285
apache-2.0
12
Use SQL to build ELT pipelines on a data lakehouse.
Created 2021-03-11
481 commits to main branch, last one 2 years ago
The Internals of Delta Lake
Created 2019-10-30
669 commits to main branch, last one 3 months ago
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Created 2024-02-22
16 commits to main branch, last one about a month ago
An IDE and translation engine for detection engineers and threat hunters. Be faster, write smarter, keep 100% privacy.
Created 2023-11-01
850 commits to main branch, last one 11 days ago
26
127
apache-2.0
13
A Data Platform built for AWS, powered by Kubernetes.
This repository has been archived (exclude archived)
Created 2020-10-08
894 commits to main branch, last one about a year ago
9
120
other
8
Roota is a public-domain language of threat detection and response that combines native queries from a SIEM, EDR, XDR, or Data Lake with standardized metadata and threat intelligence to enable automat...
Created 2023-11-01
98 commits to main branch, last one 5 months ago
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Created 2021-06-27
18 commits to hudi branch, last one 2 years ago
44
103
apache-2.0
20
Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.
Created 2021-03-25
932 commits to main branch, last one 3 days ago
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We wil...
Created 2022-05-10
45 commits to master branch, last one 2 years ago
Apache Spark Course Material
Created 2020-05-05
34 commits to master branch, last one 4 years ago
249
85
unknown
34
Apache Doris Website
Created 2018-09-21
1,755 commits to master branch, last one 3 hours ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created 2019-11-16
15 commits to master branch, last one about a year ago
A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
Created 2023-08-27
4 commits to main branch, last one about a year ago