7 results found Sort:
- Filter by Primary Language:
- Python (2)
- Go (1)
- Java (1)
- Jupyter Notebook (1)
- +
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC activ...
Created
2014-07-14
11,149 commits to master branch, last one a day ago
More than 2000+ Data engineer interview questions.
Created
2021-08-08
18 commits to master branch, last one 3 months ago
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Created
2017-11-06
60 commits to master branch, last one 5 years ago
Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers
Created
2017-08-31
107 commits to master branch, last one 5 years ago
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Created
2019-08-27
351 commits to master branch, last one 15 days ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created
2019-11-16
15 commits to master branch, last one about a year ago
Data Engineering Project with Hadoop HDFS and Kafka
Created
2023-11-04
3 commits to main branch, last one about a year ago