36 results found Sort:
- Filter by Primary Language:
- Java (10)
- Python (7)
- Rust (2)
- Go (2)
- Scala (1)
- TSQL (1)
- Dockerfile (1)
- TypeScript (1)
- Erlang (1)
- JavaScript (1)
- Jupyter Notebook (1)
- +
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
Created
2023-04-22
1,079 commits to main branch, last one 8 days ago
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Created
2019-01-19
40,125 commits to master branch, last one 6 hours ago
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for...
Created
2021-09-04
18,889 commits to main branch, last one 19 hours ago
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop....
Created
2019-08-09
9,196 commits to main branch, last one 11 hours ago
Postgres for Search and Analytics
Created
2023-06-30
1,316 commits to dev branch, last one a day ago
Upserts, Deletes And Incremental Processing on Big Data.
Created
2016-12-14
5,823 commits to master branch, last one 2 days ago
lakeFS - Data version control for your data lake | Git for data
Created
2019-09-12
5,459 commits to master branch, last one 18 hours ago
Dinky is a real-time data development platform based on Apache Flink, enabling agile data development, deployment and operation.
Created
2021-06-09
2,567 commits to dev branch, last one 3 days ago
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
Created
2021-12-28
1,148 commits to main branch, last one a day ago
The LeoFS Storage System
Created
2012-06-06
1,664 commits to v1 branch, last one 4 years ago
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
Created
2023-04-23
1,893 commits to main branch, last one 16 hours ago
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
Created
2021-08-25
2,030 commits to main branch, last one 8 days ago
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Created
2022-07-14
1,485 commits to master branch, last one 21 hours ago
汇总Apache Hudi相关资料
Created
2019-12-11
254 commits to master branch, last one 3 days ago
A free to use dbt package for creating and loading Data Vault 2.0 compliant Data Warehouses (powered by dbt, an open source data engineering tool, registered trademark of dbt Labs)
Created
2019-09-27
4,037 commits to master branch, last one 5 months ago
DuckDB-powered analytics for Postgres
Created
2024-05-09
82 commits to dev branch, last one 3 days ago
Open Control Plane for Tables in Data Lakehouse
Created
2024-02-13
335 commits to main branch, last one 6 days ago
Use SQL to build ELT pipelines on a data lakehouse.
Created
2021-03-11
481 commits to main branch, last one 2 years ago
The Internals of Delta Lake
Created
2019-10-30
669 commits to main branch, last one about a month ago
An IDE and translation engine for detection engineers and threat hunters. Be faster, write smarter, keep 100% privacy.
Created
2023-11-01
760 commits to main branch, last one 15 days ago
A curated list of open source tools used in analytics platforms and data engineering ecosystem
Created
2024-02-22
16 commits to main branch, last one 19 hours ago
A Data Platform built for AWS, powered by Kubernetes.
This repository has been archived
(exclude archived)
Created
2020-10-08
894 commits to main branch, last one about a year ago
Roota is a public-domain language of threat detection and response that combines native queries from a SIEM, EDR, XDR, or Data Lake with standardized metadata and threat intelligence to enable automat...
Created
2023-11-01
98 commits to main branch, last one 3 months ago
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Created
2021-06-27
18 commits to hudi branch, last one 2 years ago
Streaming application development and management system, based on Linkis and DSS, planning to provide the workflow-like graphical drag-and-drop development capability.
Created
2021-03-25
1,129 commits to main branch, last one 4 months ago
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We wil...
Created
2022-05-10
45 commits to master branch, last one 2 years ago
Apache Spark Course Material
Created
2020-05-05
34 commits to master branch, last one 4 years ago
Apache Doris Website
Created
2018-09-21
1,469 commits to master branch, last one 15 hours ago
A Git-like Version Control File System for Datasets Management in the Era of AI.
Created
2023-11-24
292 commits to main branch, last one 6 days ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created
2019-11-16
15 commits to master branch, last one about a year ago