8 results found Sort:

2.3k
23.4k
apache-2.0
535
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC activ...
Created 2014-07-14
11,229 commits to master branch, last one a day ago
More than 2000+ Data engineer interview questions.
Created 2021-08-08
18 commits to master branch, last one 4 months ago
34
131
bsd-2-clause
18
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Created 2017-11-06
60 commits to master branch, last one 5 years ago
27
69
apache-2.0
31
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Created 2017-08-31
107 commits to master branch, last one 5 years ago
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Created 2019-08-27
366 commits to master branch, last one about a month ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created 2019-11-16
15 commits to master branch, last one about a year ago
旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)
This repository has been archived (exclude archived)
Created 2022-03-09
68 commits to master branch, last one 9 months ago