Trending repositories for topic hadoop

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

12,757 (+39)

apache-2.0

HariSekhon/DevOps-Bash-tools

1000+ DevOps Bash Scripts - AWS, GCP, Kubernetes, Docker, CI/CD, APIs, SQL, PostgreSQL, MySQL, Hive, Impala, Kafka, Hadoop, Jenkins, GitHub, GitLab, BitBucket, Azure DevOps, TeamCity, Spotify, MP3, LD...

5,529 (+22)

mit

Tencent/APIJSON

🏆 实时零代码、全功能、强安全 ORM 库 🚀 后端接口和文档零代码，前端(客户端) 定制返回 JSON 的数据和结构 🏆 Real-Time coding-free, powerful and secure ORM 🚀 providing APIs and Docs without coding by Backend, and the returned JSON of API can...

17,290 (+15)

trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

10,483 (+14)

apache-2.0

donnemartin/data-science-ipython-notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...

27,492 (+11)

spotify/luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

17,888 (+10)

apache-2.0

prestodb/presto

The official home of the Presto distributed SQL query engine for big data

16,065 (+8)

apache-2.0

heibaiying/BigData-Notes

大数据入门指南 :star:

15,949 (+8)

apache/calcite

Apache Calcite

4,617 (+6)

apache-2.0

MoRan1607/BigDataGuide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

2,726 (+6)

deeplearning4j/deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math c...

13,688 (+5)

apache-2.0

wangzhiwubigdata/God-Of-BigData

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

9,792 (+5)

h2oai/h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Me...

6,931 (+5)

apache-2.0

apache/hadoop

Apache Hadoop

14,786 (+5)

apache-2.0

apache/nutch

Apache Nutch is an extensible and scalable web crawler

2,927 (+4)

apache-2.0

collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

1,493 (+3)

mit

ait-aecid/anomaly-detection-log-datasets

Analysis scripts for log data sets used in anomaly detection.

46 (+2)

gpl-3.0

datawhalechina/juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

282 (+2)

Alluxio/alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

6,864 (+2)

apache-2.0

hoangsonww/Moodify-Emotion-Music-App

🎹 Moodify - an emotion-based music recommendation system that uses AI/ML models to analyze text, speech, and facial expressions, providing personalized music recommendations across web and mobile pla...

25 (+1)

mit

Last 3 days (relative gain)

ait-aecid/anomaly-detection-log-datasets

Analysis scripts for log data sets used in anomaly detection.

46 (+5%)

gpl-3.0

hoangsonww/Moodify-Emotion-Music-App

25 (+4%)

mit

snowlift/trino-storage

Storage connector for Trino

97 (+1%)

apache-2.0

datawhalechina/juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

282 (+0.7%)

HariSekhon/DevOps-Bash-tools

5,529 (+0.4%)

mit

apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

12,757 (+0.3%)

apache-2.0

cubefs/compass

Compass is a task diagnosis platform for bigdata

362 (+0.3%)

apache-2.0

dromara/CloudEon

CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underl...

442 (+0.2%)

apache-2.0

MoRan1607/BigDataGuide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

2,726 (+0.2%)

cdarlint/winutils

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows

1,937 (+0.2%)

collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

1,493 (+0.2%)

mit

apache/nutch

Apache Nutch is an extensible and scalable web crawler

2,927 (+0.1%)

apache-2.0

trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

10,483 (+0.1%)

apache-2.0

apache/calcite

Apache Calcite

4,617 (+0.1%)

apache-2.0

oeljeklaus-you/UserActionAnalyzePlatform

电商用户行为分析大数据平台

976 (+0.1%)

apache-2.0

Tencent/APIJSON

17,290 (+0.1%)

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

1,153 (+0.1%)

wgzhao/Addax

Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.

1,199 (+0.1%)

apache-2.0

DTStack/Taier

Taier is a big data development platform for submission, scheduling, operation and maintenance, and indicator information display

1,336 (+0.1%)

apache-2.0

apache/drill

Apache Drill is a distributed MPP query layer for self describing data

1,948 (+0.1%)

apache-2.0

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

12,757 (+68)

apache-2.0

HariSekhon/DevOps-Bash-tools

5,529 (+35)

mit

trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

10,483 (+33)

apache-2.0

donnemartin/data-science-ipython-notebooks

27,492 (+23)

Tencent/APIJSON

17,290 (+22)

spotify/luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

17,888 (+18)

apache-2.0

wangzhiwubigdata/God-Of-BigData

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

9,792 (+17)

heibaiying/BigData-Notes

大数据入门指南 :star:

15,949 (+17)

prestodb/presto

The official home of the Presto distributed SQL query engine for big data

16,065 (+13)

apache-2.0

apache/hadoop

Apache Hadoop

14,786 (+12)

apache-2.0

deeplearning4j/deeplearning4j

13,688 (+11)

apache-2.0

h2oai/h2o-3

6,931 (+11)

apache-2.0

MoRan1607/BigDataGuide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

2,726 (+9)

apache/ignite

Apache Ignite

4,819 (+8)

apache-2.0

apache/calcite

Apache Calcite

4,617 (+8)

apache-2.0

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

1,153 (+7)

apache/nutch

Apache Nutch is an extensible and scalable web crawler

2,927 (+7)

apache-2.0

collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

1,493 (+6)

mit

apache/hive

Apache Hive

5,556 (+6)

apache-2.0

HuQi2018/BiSheServer

本系统是我的毕业设计项目，题目为“基于用户画像的电影推荐系统的设计与实现”。主要是以Django作为基础框架，采用MTV模式，数据库使用MongoDB、MySQL和Redis，以从豆瓣平台爬取的电影数据作为基础数据源，主要基于用户的基本信息和使用操作记录等行为信息来开发用户标签，并使用Hadoop、Spark大数据组件进行分析和处理的推荐系统。管理系统使用的是Django自带的管理系统，并使用si...

562 (+5)

apache-2.0

Last week (relative gain)

hoangsonww/Moodify-Emotion-Music-App

25 (+9%)

mit

ait-aecid/anomaly-detection-log-datasets

Analysis scripts for log data sets used in anomaly detection.

46 (+7%)

gpl-3.0

The-Joker123/BigData_beauty_analysis

数据大屏可视化,大数据分析（SpringBoot+hiveJDBC+echarts)

37 (+3%)

snowlift/trino-storage

Storage connector for Trino

97 (+2%)

apache-2.0

justdoitMr/rzf.github.io

✏️[计算机基础+java基础+大数据基础及进阶+面试指南] 一份涵盖计算机基础，java，大数据，面试宝典，大部分核心知识的项目，学习，面试，共同进步！

57 (+2%)

datawhalechina/juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

282 (+1%)

HuQi2018/BiSheServer

562 (+0.9%)

apache-2.0

HariSekhon/DevOps-Bash-tools

5,529 (+0.6%)

mit

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

1,153 (+0.6%)

tirthajyoti/Spark-with-Python

Fundamentals of Spark with Python (using PySpark), code examples

335 (+0.6%)

mit

apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

12,757 (+0.5%)

apache-2.0

dromara/CloudEon

442 (+0.5%)

apache-2.0

collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

1,493 (+0.4%)

mit

cdarlint/winutils

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows

1,937 (+0.4%)

wgzhao/Addax

Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.

1,199 (+0.3%)

apache-2.0

MoRan1607/BigDataGuide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

2,726 (+0.3%)

trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

10,483 (+0.3%)

apache-2.0

cubefs/compass

Compass is a task diagnosis platform for bigdata

362 (+0.3%)

apache-2.0

PriyankaJhaTheAnalyst/DataAnalystPortfolioProjects

This repository contains my Data Analytics portfolio projects ranging from SQL, Python, Tableau, Excel, and Hadoop (HiveQL).

371 (+0.3%)

apache/nutch

Apache Nutch is an extensible and scalable web crawler

2,927 (+0.2%)

apache-2.0

Last month (new repositories)

no newly created repositories trending in the last month

Last month (absolute gain)

apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

12,757 (+229)

apache-2.0

HariSekhon/DevOps-Bash-tools

5,529 (+189)

mit

trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

10,483 (+137)

apache-2.0

donnemartin/data-science-ipython-notebooks

27,492 (+121)

spotify/luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

17,888 (+94)

apache-2.0

heibaiying/BigData-Notes

大数据入门指南 :star:

15,949 (+89)

Tencent/APIJSON

17,290 (+72)

wangzhiwubigdata/God-Of-BigData

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

9,792 (+71)

apache/hadoop

Apache Hadoop

14,786 (+63)

apache-2.0

prestodb/presto

The official home of the Presto distributed SQL query engine for big data

16,065 (+62)

apache-2.0

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

1,153 (+46)

MoRan1607/BigDataGuide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

2,726 (+38)

deeplearning4j/deeplearning4j

13,688 (+37)

apache-2.0

apache/calcite

Apache Calcite

4,617 (+36)

apache-2.0

HuQi2018/BiSheServer

562 (+33)

apache-2.0

h2oai/h2o-3

6,931 (+33)

apache-2.0

collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

1,493 (+32)

mit

apache/ignite

Apache Ignite

4,819 (+31)

apache-2.0

LuckyZXL2016/Movie_Recommend

基于Spark的电影推荐系统，包含爬虫项目、web网站、后台管理系统以及spark推荐系统

2,823 (+30)

mit

Alluxio/alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

6,864 (+27)

apache-2.0

Last month (relative gain)

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

32 (+23%)

mit

ait-aecid/anomaly-detection-log-datasets

Analysis scripts for log data sets used in anomaly detection.

46 (+21%)

gpl-3.0

confluentinc/kafka-connect-hdfs

Kafka Connect HDFS connector

12 (+20%)

hoangsonww/Moodify-Emotion-Music-App

25 (+19%)

mit

tuanx18/data-engineer-portfolio

This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.

32 (+19%)

Mrkuhuo/bigdata_learning

大数据组件学习代码

42 (+11%)

myamafuj/hadoop-hive-spark-docker

Hadoop-Hive-Spark cluster + Jupyter on Docker

61 (+7%)

wzqwtt/BigData

小白大数据学习笔记 :star:

31 (+7%)

HariSekhon/Knowledge-Base

IT Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public

96 (+7%)

mit

HuQi2018/BiSheServer

562 (+6%)

apache-2.0

mrugankray/Big-Data-Cluster

The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center ...

55 (+6%)

mit

The-Joker123/BigData_beauty_analysis

数据大屏可视化,大数据分析（SpringBoot+hiveJDBC+echarts)

37 (+6%)

wzdnzd/bigdata-notes

BigData Learning Notes

46 (+5%)

apache-2.0

snowlift/trino-storage

Storage connector for Trino

97 (+4%)

apache-2.0

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

1,153 (+4%)

PriyankaJhaTheAnalyst/DataAnalystPortfolioProjects

This repository contains my Data Analytics portfolio projects ranging from SQL, Python, Tableau, Excel, and Hadoop (HiveQL).

371 (+4%)

smart-data-lake/smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

111 (+4%)

gpl-3.0

justdoitMr/rzf.github.io

57 (+4%)

marcelmay/hfsa

Hadoop FSImage Analyzer (HFSA)

58 (+4%)

apache-2.0

HariSekhon/DevOps-Bash-tools

5,529 (+4%)

mit

Last 12-months (new repositories)

HariSekhon/Knowledge-Base

IT Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public

mit

hoangsonww/Moodify-Emotion-Music-App

mit

Last 12-months (absolute gain)

HariSekhon/DevOps-Bash-tools

5,529 (+3,526)

mit

apache/doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

12,757 (+2,666)

apache-2.0

donnemartin/data-science-ipython-notebooks

27,492 (+1,688)

trinodb/trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

10,483 (+1,634)

apache-2.0

heibaiying/BigData-Notes

大数据入门指南 :star:

15,949 (+1,255)

Tencent/APIJSON

17,290 (+1,249)

spotify/luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

17,888 (+978)

apache-2.0

wangzhiwubigdata/God-Of-BigData

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

9,792 (+952)

prestodb/presto

The official home of the Presto distributed SQL query engine for big data

16,065 (+843)

apache-2.0

apache/hadoop

Apache Hadoop

14,786 (+803)

apache-2.0

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

1,153 (+533)

apache/calcite

Apache Calcite

4,617 (+503)

apache-2.0

deeplearning4j/deeplearning4j

13,688 (+453)

apache-2.0

Alluxio/alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

6,864 (+420)

apache-2.0

MoRan1607/BigDataGuide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

2,726 (+409)

apache/hive

Apache Hive

5,556 (+407)

apache-2.0

collabH/bigdata-growth

大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。

1,493 (+395)

mit

h2oai/h2o-3

6,931 (+377)

apache-2.0

linkedin/school-of-sre

At LinkedIn, we are using this curriculum for onboarding our entry-level talents into the SRE role.

7,844 (+370)

apache/kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

2,105 (+321)

apache-2.0

Last 12-months (relative gain)

ait-aecid/anomaly-detection-log-datasets

Analysis scripts for log data sets used in anomaly detection.

46 (+667%)

gpl-3.0

tuanx18/data-engineer-portfolio

This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.

32 (+540%)

AhmetFurkanDEMIR/Data-Engineering-Project-with-HDFS-and-Kafka

Data Engineering Project with Hadoop HDFS and Kafka

32 (+357%)

mit

dogukannulu/streaming_data_processing

Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO

57 (+307%)

HariSekhon/DevOps-Bash-tools

5,529 (+176%)

mit

Mrkuhuo/bigdata_learning

大数据组件学习代码

42 (+163%)

mrugankray/Big-Data-Cluster

55 (+139%)

mit

myamafuj/hadoop-hive-spark-docker

Hadoop-Hive-Spark cluster + Jupyter on Docker

61 (+135%)

PriyankaJhaTheAnalyst/DataAnalystPortfolioProjects

This repository contains my Data Analytics portfolio projects ranging from SQL, Python, Tableau, Excel, and Hadoop (HiveQL).

371 (+126%)

The-Joker123/BigData_beauty_analysis

数据大屏可视化,大数据分析（SpringBoot+hiveJDBC+echarts)

37 (+118%)

fancyChuan/bigdata-hub

数据建设与大数据技术知识体系，包含hadoop、hive、spark、flink主流框架和系列框架，数据中台、数据湖、数据治理、数仓建设、数据化转型等

327 (+97%)

OBenner/data-engineering-interview-questions

More than 2000+ Data engineer interview questions.

1,153 (+86%)

justdoitMr/rzf.github.io

57 (+73%)

apache/doris-website

Apache Doris Website

82 (+64%)

HuQi2018/BiSheServer

562 (+61%)

apache-2.0

wzqwtt/BigData

小白大数据学习笔记 :star:

31 (+55%)

wzdnzd/bigdata-notes

BigData Learning Notes

46 (+53%)

apache-2.0

apache/doris-thirdparty

Self-managed thirdparty dependencies for Apache Doris

32 (+52%)

apache-2.0

yuan-more/bigdata-book

上百本大数据电子书，附带下载链接，包括计算机基础，Java，hadoop，spark，flink，kafka，hbase，hive，数仓等

75 (+47%)

datawhalechina/juicy-bigdata

🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉

282 (+40%)