42 results found Sort:

250
4.0k
apache-2.0
43
Build data pipelines, the easy way 🛠️
Created 2020-05-21
9,688 commits to master branch, last one 12 months ago
973
3.8k
apache-2.0
73
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
Created 2019-05-27
1,806 commits to dev branch, last one 3 days ago
Implementing best practices for PySpark ETL jobs and applications.
Created 2017-12-28
36 commits to master branch, last one 2 years ago
86
1.5k
bsd-3-clause-clear
12
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.
Created 2023-02-23
1,410 commits to main branch, last one 12 hours ago
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Created 2020-01-20
80 commits to master branch, last one 4 years ago
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Created 2020-02-13
50 commits to master branch, last one 4 years ago
38
868
bsd-3-clause-clear
20
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
This repository has been archived (exclude archived)
Created 2020-05-26
516 commits to main branch, last one 11 months ago
A Clojure high performance data processing system
Created 2019-02-14
1,688 commits to master branch, last one 14 days ago
151
576
mit
52
A simplified, lightweight ETL Framework based on Apache Spark
Created 2017-10-10
658 commits to master branch, last one about a year ago
22
367
mit
6
Flow PHP - data processing framework
Created 2021-05-23
3,760 commits to 1.x branch, last one 5 days ago
31
177
apache-2.0
13
A simple Spark-powered ETL framework that just works 🍺
Created 2019-12-20
627 commits to master branch, last one about a year ago
17
172
agpl-3.0
3
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Created 2024-02-21
390 commits to main branch, last one a day ago
Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management
Created 2020-08-27
1,092 commits to master branch, last one 2 years ago
This is a template you can use for your next data engineering portfolio project.
Created 2021-09-10
3 commits to main branch, last one 2 years ago
11
123
mit
6
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
Created 2022-06-22
504 commits to main branch, last one 18 hours ago
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
Created 2023-09-06
3 commits to main branch, last one 8 months ago
Data pipelines from re-usable components
Created 2020-05-19
253 commits to master branch, last one about a year ago
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Created 2021-04-15
247 commits to main branch, last one about a year ago
Regular practice on Data Science, Machien Learning, Deep Learning, Solving ML Project problem, Analytical Issue. Regular boost up my knowledge. The goal is to help learner with learning resource on Da...
Created 2020-09-06
206 commits to master branch, last one about a year ago
an app engine for your business. Seamlessly implement business logic with a powerful API. Out of the box CMS, blog, forum and email functionality. Developer friendly & easily extendable for your next ...
Created 2021-04-03
1,079 commits to master branch, last one 7 months ago
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We wil...
Created 2022-05-10
45 commits to master branch, last one about a year ago
SEO dashboard from Search console Data using the Google Search API, Mysql database , NodeJS RESTAPI( ExpressJS) and reactJs Dashboard
Created 2019-05-27
61 commits to master branch, last one about a year ago
2
79
apache-2.0
3
Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python.
Created 2022-07-21
673 commits to main branch, last one 2 months ago
10
76
unknown
4
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
Created 2022-10-06
2,074 commits to main branch, last one 16 hours ago
9
72
apache-2.0
6
Move your data with ease.
Created 2023-06-19
347 commits to main branch, last one 9 days ago
3
66
bsd-3-clause
5
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Created 2016-09-14
65 commits to master branch, last one 2 years ago
Azure Data Factory Hands On Lab - Step by Step - A Comprehensive Azure Data Factory and Mapping Data Flow step by step tutorial
Created 2020-02-12
98 commits to master branch, last one 3 years ago
Ethereum Analytical Database - Ethereum data access solution that can be used for analytics and application development. The solution works on a fast DB - Clickhouse.
Created 2019-03-08
936 commits to master branch, last one 2 years ago
One ETL tool to rule them all
Created 2023-04-19
1,264 commits to develop branch, last one 2 days ago
Near real time ETL to populate a dashboard.
Created 2021-07-10
23 commits to main branch, last one 11 months ago