37 results found Sort:
- Filter by Primary Language:
- Python (9)
- Java (5)
- Go (4)
- Scala (4)
- Jupyter Notebook (4)
- TypeScript (2)
- C# (1)
- Vue (1)
- C++ (1)
- Dockerfile (1)
- JavaScript (1)
- Kotlin (1)
- Rust (1)
- +
lakeFS - Data version control for your data lake | Git for data
Created
2019-09-12
5,727 commits to master branch, last one 2 days ago
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Created
2022-01-26
3,549 commits to devel branch, last one 4 days ago
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Created
2017-12-18
4,265 commits to master branch, last one a day ago
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every d...
Created
2022-09-29
236 commits to master branch, last one about a year ago
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Created
2020-01-20
80 commits to master branch, last one 5 years ago
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Created
2020-02-13
50 commits to master branch, last one 5 years ago
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is license...
Created
2017-02-08
8,313 commits to master branch, last one 6 years ago
Personal Data Engineering Projects
Created
2020-04-20
65 commits to master branch, last one 2 years ago
Data API Framework for AI Agents and Data Apps
Created
2022-04-27
1,056 commits to develop branch, last one 11 months ago
Lakekeeper is an Apache-Licensed, secure, fast and easy to use Apache Iceberg REST Catalog written in Rust.
Created
2024-04-05
672 commits to main branch, last one 23 hours ago
Generic Data Ingestion & Dispersal Library for Hadoop
This repository has been archived
(exclude archived)
Created
2018-01-05
33 commits to master branch, last one 5 years ago
Enterprise-grade, production-hardened, serverless data lake on AWS
Created
2020-09-08
647 commits to main branch, last one 2 days ago
Use SQL to build ELT pipelines on a data lakehouse.
Created
2021-03-11
481 commits to main branch, last one 2 years ago
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Created
2020-02-07
723 commits to master branch, last one 25 days ago
BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)
Created
2023-05-22
7 commits to master branch, last one 10 months ago
U-SQL Examples and Issue Tracking
Created
2015-10-13
253 commits to master branch, last one 2 years ago
🤖 The Semantic Engine for Model Context Protocol(MCP) Clients and AI Agents 🔥
Created
2022-05-09
898 commits to main branch, last one 3 days ago
Resources for video demonstrations and blog posts related to DataOps on AWS
Created
2021-11-07
107 commits to main branch, last one 3 years ago
An efficient storage and compute engine for both on-prem and cloud-native data analytics.
Created
2019-06-21
1,439 commits to master branch, last one 13 days ago
Samples and Docs for Azure Data Lake Store and Analytics
Created
2015-04-28
861 commits to master branch, last one 2 years ago
Apache Spark 3 - Structured Streaming Course Material
Created
2020-07-21
29 commits to master branch, last one 4 years ago
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Created
2019-08-07
2,046 commits to develop-spark3 branch, last one 5 days ago
Cloudflare R2 bucket File Uploader with multipart upload enabled. Tested with files up to 10 GB size. Demo example for NextJS.
Created
2023-09-13
24 commits to main branch, last one 7 months ago
A Git-like Version Control File System for AI & Data Product Management.
Created
2023-11-24
298 commits to main branch, last one 3 months ago
Apache Spark Course Material
Created
2020-05-05
34 commits to master branch, last one 4 years ago
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
Created
2023-04-04
15 commits to main branch, last one about a year ago
GraphQL API for Zeebe data
Created
2020-02-03
781 commits to main branch, last one 5 months ago
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
This repository has been archived
(exclude archived)
Created
2018-01-29
135 commits to master branch, last one 6 months ago
The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt
Created
2022-04-27
443 commits to main branch, last one 6 days ago
Web UI for Amazon Athena
Created
2020-12-30
42 commits to master branch, last one 2 years ago