61 results found Sort:

2.2k
21.4k
apache-2.0
535
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC activ...
Created 2014-07-14
10,586 commits to master branch, last one 2 days ago
4.2k
15.4k
unknown
443
大数据入门指南 :star:
Created 2019-03-10
607 commits to master branch, last one about a year ago
5.9k
13.4k
other
653
Ceph is a distributed object, block, and file storage platform
Created 2011-09-01
145,728 commits to main branch, last one 13 hours ago
877
9.9k
apache-2.0
112
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Created 2021-01-08
3,205 commits to main branch, last one 23 hours ago
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Created 2019-02-14
405 commits to master branch, last one about a year ago
Utils for streaming large files (S3, HDFS, gzip, bz2...)
Created 2015-01-02
1,078 commits to develop branch, last one 23 days ago
179
1.8k
mit
71
The Universal Storage Engine
Created 2017-03-31
4,784 commits to dev branch, last one 22 hours ago
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Created 2019-08-12
80 commits to master branch, last one 3 years ago
339
1.4k
mit
38
A native go client for HDFS
Created 2014-10-08
465 commits to master branch, last one 10 months ago
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Created 2020-06-10
1,054 commits to master branch, last one 3 days ago
288
1.1k
apache-2.0
32
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Created 2019-07-17
1,407 commits to master branch, last one a day ago
216
858
apache-2.0
129
A pure python HDFS client
This repository has been archived (exclude archived)
Created 2013-05-07
348 commits to master branch, last one 2 years ago
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML...
Created 2015-10-27
5,331 commits to master branch, last one 4 days ago
Web tool for Kafka Connect |
Created 2016-12-02
197 commits to master branch, last one 2 years ago
Kafka Connect HDFS connector
Created 2015-05-20
2,517 commits to master branch, last one 7 days ago
93
396
apache-2.0
13
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underl...
Created 2023-01-29
883 commits to dev branch, last one about a month ago
Big Data Ecosystem Docker
Created 2020-03-20
82 commits to master branch, last one 2 years ago
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Created 2017-05-07
250 commits to master branch, last one 2 years ago
Fundamentals of Spark with Python (using PySpark), code examples
Created 2018-08-20
73 commits to master branch, last one 3 years ago
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Created 2020-02-05
119 commits to master branch, last one 2 months ago
99
267
mit
15
API and command line interface for HDFS
Created 2014-03-10
367 commits to master branch, last one 7 months ago
77
252
mit
25
weather radar data processing - python package
Created 2016-02-19
1,615 commits to main branch, last one 3 months ago
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Created 2021-11-18
322 commits to master branch, last one about a year ago
80
208
other
27
⛈️ RumbleDB 1.21.0 "Hawthorn blossom" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to d...
Created 2017-09-04
6,426 commits to master branch, last one 3 months ago
Python interface to the TileDB storage engine
Created 2017-05-19
1,932 commits to dev branch, last one 2 days ago
175
157
apache-2.0
93
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
Created 2016-06-02
3,040 commits to master branch, last one 3 years ago
21
152
gpl-3.0
15
A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
Created 2020-06-29
283 commits to master branch, last one 2 days ago
33
150
apache-2.0
32
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Created 2017-02-09
56 commits to master branch, last one 2 years ago
Exports Hadoop HDFS content statistics to Prometheus
Created 2017-07-17
349 commits to master branch, last one 13 hours ago