66 results found Sort:

2.3k
23.0k
apache-2.0
535
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC activ...
Created 2014-07-14
11,136 commits to master branch, last one 18 hours ago
4.2k
15.9k
unknown
447
大数据入门指南 :star:
Created 2019-03-10
607 commits to master branch, last one about a year ago
6.0k
14.2k
other
657
Ceph is a distributed object, block, and file storage platform
Created 2011-09-01
149,357 commits to main branch, last one a day ago
967
10.9k
apache-2.0
113
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Created 2021-01-08
3,381 commits to main branch, last one a day ago
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Created 2019-02-14
405 commits to master branch, last one 2 years ago
Utils for streaming large files (S3, HDFS, gzip, bz2...)
Created 2015-01-02
1,091 commits to develop branch, last one 21 days ago
185
1.9k
mit
69
The Universal Storage Engine
Created 2017-03-31
5,016 commits to dev branch, last one a day ago
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Created 2019-08-12
80 commits to master branch, last one 3 years ago
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Created 2020-06-10
1,088 commits to master branch, last one 7 days ago
341
1.4k
mit
37
A native go client for HDFS
Created 2014-10-08
465 commits to master branch, last one about a year ago
305
1.2k
apache-2.0
33
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Created 2019-07-17
1,610 commits to master branch, last one 9 days ago
216
855
apache-2.0
128
A pure python HDFS client
This repository has been archived (exclude archived)
Created 2013-05-07
348 commits to master branch, last one 2 years ago
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML...
Created 2015-10-27
5,403 commits to master branch, last one about a month ago
Web tool for Kafka Connect |
Created 2016-12-02
197 commits to master branch, last one 3 years ago
107
442
apache-2.0
14
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underl...
Created 2023-01-29
1,003 commits to dev2.0 branch, last one 20 days ago
Big Data Ecosystem Docker
Created 2020-03-20
82 commits to master branch, last one 2 years ago
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Created 2017-05-07
250 commits to master branch, last one 2 years ago
Fundamentals of Spark with Python (using PySpark), code examples
Created 2018-08-20
73 commits to master branch, last one 4 years ago
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Created 2020-02-05
119 commits to master branch, last one 7 months ago
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Created 2021-11-18
322 commits to master branch, last one about a year ago
102
270
mit
15
API and command line interface for HDFS
Created 2014-03-10
368 commits to master branch, last one about a month ago
82
269
mit
25
weather radar data processing - python package
Created 2016-02-19
1,624 commits to main branch, last one 3 days ago
82
213
other
27
⛈️ RumbleDB 1.22.0 "Pyrenean oak" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to downl...
Created 2017-09-04
6,715 commits to master branch, last one 20 days ago
Python interface to the TileDB storage engine
Created 2017-05-19
2,027 commits to dev branch, last one a day ago
25
171
gpl-3.0
15
A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
Created 2020-06-29
324 commits to master branch, last one 25 days ago
172
157
apache-2.0
92
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
This repository has been archived (exclude archived)
Created 2016-06-02
3,040 commits to master branch, last one 4 years ago
Exports Hadoop HDFS content statistics to Prometheus
Created 2017-07-17
399 commits to master branch, last one 3 days ago
19
152
apache-2.0
5
seaweedfs implemented in pure Rust
Created 2023-08-20
49 commits to main branch, last one about a month ago
33
150
apache-2.0
31
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Created 2017-02-09
56 commits to master branch, last one 2 years ago