5 results found Sort:

MapReduce, Spark, Java, and Scala for Data Algorithms Book
Created 2014-08-06
787 commits to master branch, last one 5 months ago
Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.
Created 2022-08-17
1 commits to main branch, last one 2 years ago
Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.
Created 2023-04-10
77 commits to master branch, last one 5 months ago
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are s...
Created 2019-08-27
370 commits to master branch, last one 2 months ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created 2019-11-16
15 commits to master branch, last one about a year ago