7 results found Sort:

2.3k
23.0k
apache-2.0
535
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC activ...
Created 2014-07-14
11,149 commits to master branch, last one a day ago
More than 2000+ Data engineer interview questions.
Created 2021-08-08
18 commits to master branch, last one 3 months ago
34
131
bsd-2-clause
18
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Created 2017-11-06
60 commits to master branch, last one 5 years ago
27
69
apache-2.0
31
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Created 2017-08-31
107 commits to master branch, last one 5 years ago
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Created 2019-08-27
351 commits to master branch, last one 15 days ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created 2019-11-16
15 commits to master branch, last one about a year ago