4 results found Sort:

MapReduce, Spark, Java, and Scala for Data Algorithms Book
Created 2014-08-06
785 commits to master branch, last one 2 years ago
Cloud Shuffle Service(CSS) is a general purpose remote shuffle solution for compute engines, including Spark/Flink/MapReduce.
Created 2022-08-17
1 commits to main branch, last one about a year ago
Hadoop, MapReduce Distributed Crawling of Data Information from All Chinese Universities.
Created 2023-04-10
76 commits to master branch, last one 7 months ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created 2019-11-16
15 commits to master branch, last one about a year ago