6 results found Sort:
- Filter by Primary Language:
- Go (1)
- Java (1)
- Jupyter Notebook (1)
- Python (1)
- +
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC activ...
Created
2014-07-14
10,586 commits to master branch, last one 2 days ago
More than 2000+ Data engineer interview questions.
Created
2021-08-08
16 commits to master branch, last one 12 days ago
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Created
2017-11-06
60 commits to master branch, last one 4 years ago
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Created
2017-08-31
107 commits to master branch, last one 5 years ago
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Created
2019-08-27
290 commits to master branch, last one 5 days ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created
2019-11-16
15 commits to master branch, last one about a year ago