33 results found Sort:

332
4.1k
apache-2.0
40
lakeFS - Data version control for your data lake | Git for data
Created 2019-09-12
5,199 commits to master branch, last one 15 hours ago
868
2.0k
apache-2.0
65
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Created 2017-12-18
3,990 commits to master branch, last one 17 hours ago
110
1.9k
apache-2.0
19
data load tool (dlt) is an open source Python library that makes data loading easy 🛠️
Created 2022-01-26
3,037 commits to devel branch, last one a day ago
327
1.6k
apache-2.0
61
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every d...
Created 2022-09-29
236 commits to master branch, last one 5 months ago
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Created 2020-01-20
80 commits to master branch, last one 4 years ago
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Created 2020-02-13
50 commits to master branch, last one 4 years ago
581
1.1k
apache-2.0
115
Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is license...
Created 2017-02-08
8,313 commits to master branch, last one 5 years ago
Personal Data Engineering Projects
Created 2020-04-20
65 commits to master branch, last one about a year ago
25
604
apache-2.0
14
Data API Framework for AI Agents and Data Apps
Created 2022-04-27
1,056 commits to develop branch, last one about a month ago
110
474
other
28
Generic Data Ingestion & Dispersal Library for Hadoop
This repository has been archived (exclude archived)
Created 2018-01-05
33 commits to master branch, last one 5 years ago
Enterprise-grade, production-hardened, serverless data lake on AWS
Created 2020-09-08
364 commits to main branch, last one 18 hours ago
28
283
apache-2.0
12
Use SQL to build ELT pipelines on a data lakehouse.
Created 2021-03-11
481 commits to main branch, last one 2 years ago
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Created 2020-02-07
712 commits to master branch, last one 8 days ago
683
232
mit
113
U-SQL Examples and Issue Tracking
Created 2015-10-13
253 commits to master branch, last one about a year ago
BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)
Created 2023-05-22
7 commits to master branch, last one about a month ago
Resources for video demonstrations and blog posts related to DataOps on AWS
Created 2021-11-07
107 commits to main branch, last one 2 years ago
103
140
mit
109
Samples and Docs for Azure Data Lake Store and Analytics
Created 2015-04-28
861 commits to master branch, last one about a year ago
Apache Spark 3 - Structured Streaming Course Material
Created 2020-07-21
29 commits to master branch, last one 3 years ago
Smart Automation Tool for building modern Data Lakes and Data Pipelines
Created 2019-08-07
1,875 commits to develop-spark3 branch, last one 2 days ago
Apache Spark Course Material
Created 2020-05-05
34 commits to master branch, last one 3 years ago
17
75
apache-2.0
12
Wren Engine is the backbone of the WrenAI project - The semantic engine for LLMs, bringing business context to AI agents.
Created 2022-05-09
437 commits to main branch, last one a day ago
Cloudflare R2 bucket File Uploader
Created 2023-09-13
19 commits to main branch, last one 4 months ago
GraphQL API for Zeebe data
Created 2020-02-03
720 commits to main branch, last one 21 hours ago
9
60
apache-2.0
25
Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.
Created 2018-01-29
134 commits to master branch, last one 3 years ago
26
52
apache-2.0
3
Web UI for Amazon Athena
Created 2020-12-30
42 commits to master branch, last one about a year ago
1
48
apache-2.0
12
The DBT of ML, as Aligned describes data dependencies in ML systems, and reduce technical data debt
Created 2022-04-27
410 commits to main branch, last one 2 days ago
Udacity Data Engineering Nanodegree Program
Created 2021-01-19
47 commits to main branch, last one 3 years ago
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
Created 2023-04-04
15 commits to main branch, last one 9 months ago