37 results found Sort:
- Filter by Primary Language:
- Python (9)
- Java (5)
- Go (4)
- Scala (4)
- Jupyter Notebook (4)
- TypeScript (2)
- C# (1)
- Vue (1)
- C++ (1)
- Dockerfile (1)
- JavaScript (1)
- Kotlin (1)
- Rust (1)
- +
lakeFS - Data version control for your data lake | Git for data
Created
2019-09-12
5,497 commits to master branch, last one a day ago
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Created
2022-01-26
3,447 commits to devel branch, last one 3 days ago
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Created
2017-12-18
4,187 commits to master branch, last one 4 days ago
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every d...
Created
2022-09-29
236 commits to master branch, last one 11 months ago
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Created
2020-01-20
80 commits to master branch, last one 4 years ago
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Created
2020-02-13
50 commits to master branch, last one 4 years ago
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is license...
Created
2017-02-08
8,313 commits to master branch, last one 5 years ago
Personal Data Engineering Projects
Created
2020-04-20
65 commits to master branch, last one 2 years ago
Data API Framework for AI Agents and Data Apps
Created
2022-04-27
1,056 commits to develop branch, last one 8 months ago
Generic Data Ingestion & Dispersal Library for Hadoop
This repository has been archived
(exclude archived)
Created
2018-01-05
33 commits to master branch, last one 5 years ago
Enterprise-grade, production-hardened, serverless data lake on AWS
Created
2020-09-08
589 commits to main branch, last one 16 days ago
Lakekeeper: A Rust native Iceberg REST Catalog
Created
2024-04-05
476 commits to main branch, last one a day ago
Use SQL to build ELT pipelines on a data lakehouse.
Created
2021-03-11
481 commits to main branch, last one 2 years ago
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Created
2020-02-07
720 commits to master branch, last one 3 months ago
U-SQL Examples and Issue Tracking
Created
2015-10-13
253 commits to master branch, last one about a year ago
BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)
Created
2023-05-22
7 commits to master branch, last one 7 months ago
Resources for video demonstrations and blog posts related to DataOps on AWS
Created
2021-11-07
107 commits to main branch, last one 2 years ago
🤖 The semantic engine for LLMs, bringing semantic context to AI agents. 🔥
Created
2022-05-09
796 commits to main branch, last one 2 days ago
Samples and Docs for Azure Data Lake Store and Analytics
Created
2015-04-28
861 commits to master branch, last one about a year ago
An efficient storage and compute engine for both on-prem and cloud-native data analytics.
Created
2019-06-21
1,420 commits to master branch, last one 21 hours ago
Apache Spark 3 - Structured Streaming Course Material
Created
2020-07-21
29 commits to master branch, last one 4 years ago
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Created
2019-08-07
2,007 commits to develop-spark3 branch, last one 14 hours ago
Cloudflare R2 bucket File Uploader with multipart upload enabled. Tested with files up to 10 GB size.
Created
2023-09-13
24 commits to main branch, last one 4 months ago
A Git-like Version Control File System for AI & Data Product Management.
Created
2023-11-24
298 commits to main branch, last one 19 days ago
Apache Spark Course Material
Created
2020-05-05
34 commits to master branch, last one 4 years ago
GraphQL API for Zeebe data
Created
2020-02-03
781 commits to main branch, last one about a month ago
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
This repository has been archived
(exclude archived)
Created
2018-01-29
135 commits to master branch, last one 3 months ago
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
Created
2023-04-04
15 commits to main branch, last one about a year ago
Web UI for Amazon Athena
Created
2020-12-30
42 commits to master branch, last one 2 years ago
The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt
Created
2022-04-27
429 commits to main branch, last one 26 days ago