Search Results - RepositoryStats

2.4k

24.1k

apache-2.0

534

SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC activ...

s3 fuse hdfs posix seaweedfs kubernetes s3-storage cloud-drive hadoop-hdfs replication blob-storage erasure-coding object-storage tiered-file-system distributed-storage distributed-systems distributed-file-system

Created 2014-07-14

11,413 commits to master branch, last one 4 days ago

data-engineering-interview-questions OBenner

467

1.3k

unknown

21

More than 2000+ Data engineer interview questions.

Created 2021-08-08

19 commits to master branch, last one 2 months ago

dynamometer linkedin

34

131

bsd-2-clause

17

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

hdfs scale hadoop testing hdfs-dfs scale-up hadoop-hdfs testing-tools hadoop-framework performance-test hadoop-filesystem performance-metrics performance-testing performance-analysis

Created 2017-11-06

60 commits to master branch, last one 5 years ago

Data-Engineering-Project-with-HDFS-and-Kafka AhmetFurkanDEMIR

25

102

mit

3

Data Engineering Project with Hadoop HDFS and Kafka

Created 2023-11-04

3 commits to main branch, last one about a year ago

big_data groda

26

73

mit

2

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are s...

mrjob spark bigtop docker hadoop bigdata pyspark big-data mapreduce spark-sql testdfsio hadoop-hdfs apache-spark apache-sedona hadoop-cluster mapreduce-bash gutenberg-ebooks hadoop-mapreduce jupyter-notebook

Created 2019-08-27

370 commits to master branch, last one 3 months ago

sparksql-for-hbase IBM

27

69

apache-2.0

30

Learn how to use Spark SQL and HSpark connector package to create / query data tables that reside in HBase region servers

sql hbase nosql spark ibmcode hadoop-hdfs apache-spark

Created 2017-08-31

107 commits to master branch, last one 5 years ago

datapipelines-essentials-python vim89

38

54

apache-2.0

4

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...

etl xml spark hadoop python pyspark python3 big-data datalake spark-sql hadoop-hdfs xml-parsing apache-spark etl-pipeline data-pipeline etl-framework etl-components hadoop-mapreduce

Created 2019-11-16

15 commits to master branch, last one about a year ago

TravelWebsite_BigDataAnalysis jarlor

1

32

mulanpsl-2.0

1

旅游网站(携程网部分数据)大数据分析-hadoop课程设计(本科课设级别)

java bigdata mapreduce coursework hadoop-hdfs

This repository has been archived (exclude archived)

Created 2022-03-09

68 commits to master branch, last one about a year ago