6 results found Sort:

2.2k
21.4k
apache-2.0
535
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC activ...
Created 2014-07-14
10,586 commits to master branch, last one 2 days ago
More than 2000+ Data engineer interview questions.
Created 2021-08-08
16 commits to master branch, last one 12 days ago
36
129
bsd-2-clause
18
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Created 2017-11-06
60 commits to master branch, last one 4 years ago
27
68
apache-2.0
31
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Created 2017-08-31
107 commits to master branch, last one 5 years ago
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Created 2019-08-27
290 commits to master branch, last one 5 days ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created 2019-11-16
15 commits to master branch, last one about a year ago