8 results found Sort:
- Filter by Primary Language:
- Java (2)
- Python (2)
- Go (1)
- Jupyter Notebook (1)
- +
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC activ...
Created
2014-07-14
11,229 commits to master branch, last one a day ago
More than 2000+ Data engineer interview questions.
Created
2021-08-08
18 commits to master branch, last one 4 months ago
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Created
2017-11-06
60 commits to master branch, last one 5 years ago
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Created
2017-08-31
107 commits to master branch, last one 5 years ago
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Created
2019-08-27
366 commits to master branch, last one about a month ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created
2019-11-16
15 commits to master branch, last one about a year ago
Data Engineering Project with Hadoop HDFS and Kafka
Created
2023-11-04
3 commits to main branch, last one about a year ago
旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)
This repository has been archived
(exclude archived)
Created
2022-03-09
68 commits to master branch, last one 9 months ago