66 results found Sort:
- Filter by Primary Language:
- Java (20)
- Python (15)
- Go (6)
- C++ (3)
- Shell (3)
- Scala (3)
- VBA (2)
- Jupyter Notebook (2)
- PLpgSQL (1)
- Perl (1)
- R (1)
- Ruby (1)
- Rust (1)
- JavaScript (1)
- Groovy (1)
- TypeScript (1)
- FreeMarker (1)
- +
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC activ...
Created
2014-07-14
11,136 commits to master branch, last one 18 hours ago
大数据入门指南 :star:
Created
2019-03-10
607 commits to master branch, last one about a year ago
Ceph is a distributed object, block, and file storage platform
Created
2011-09-01
149,357 commits to main branch, last one a day ago
JuiceFS is a distributed POSIX file system built on top of Redis and S3.
Created
2021-01-08
3,381 commits to main branch, last one a day ago
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Created
2019-02-14
405 commits to master branch, last one 2 years ago
Utils for streaming large files (S3, HDFS, gzip, bz2...)
Created
2015-01-02
1,091 commits to develop branch, last one 21 days ago
The Universal Storage Engine
Created
2017-03-31
5,016 commits to dev branch, last one a day ago
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Created
2019-08-12
80 commits to master branch, last one 3 years ago
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Created
2020-06-10
1,088 commits to master branch, last one 7 days ago
A native go client for HDFS
Created
2014-10-08
465 commits to master branch, last one about a year ago
Addax is a versatile open-source ETL tool that can seamlessly transfer data between various RDBMS and NoSQL databases, making it an ideal solution for data migration.
Created
2019-07-17
1,610 commits to master branch, last one 9 days ago
A pure python HDFS client
This repository has been archived
(exclude archived)
Created
2013-05-07
348 commits to master branch, last one 2 years ago
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Functions, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML...
Created
2015-10-27
5,403 commits to master branch, last one about a month ago
Web tool for Kafka Connect |
Created
2016-12-02
197 commits to master branch, last one 3 years ago
CloudEon uses Kubernetes to install and deploy open-source big data components, enabling the containerized operation of an open-source big data platform. This allows you to reduce your focus on underl...
Created
2023-01-29
1,003 commits to dev2.0 branch, last one 20 days ago
Big Data Ecosystem Docker
Created
2020-03-20
82 commits to master branch, last one 2 years ago
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Created
2017-05-07
250 commits to master branch, last one 2 years ago
Fundamentals of Spark with Python (using PySpark), code examples
Created
2018-08-20
73 commits to master branch, last one 4 years ago
A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Created
2020-02-05
119 commits to master branch, last one 7 months ago
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Created
2021-11-18
322 commits to master branch, last one about a year ago
API and command line interface for HDFS
Created
2014-03-10
368 commits to master branch, last one about a month ago
weather radar data processing - python package
Created
2016-02-19
1,624 commits to main branch, last one 3 days ago
Must-read Papers for File System (FS)
Created
2021-12-01
101 commits to main branch, last one about a month ago
⛈️ RumbleDB 1.22.0 "Pyrenean oak" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to downl...
Created
2017-09-04
6,715 commits to master branch, last one 20 days ago
Python interface to the TileDB storage engine
Created
2017-05-19
2,027 commits to dev branch, last one a day ago
A distributed storage benchmark for file systems, object stores & block devices with support for GPUs
Created
2020-06-29
324 commits to master branch, last one 25 days ago
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
This repository has been archived
(exclude archived)
Created
2016-06-02
3,040 commits to master branch, last one 4 years ago
Exports Hadoop HDFS content statistics to Prometheus
Created
2017-07-17
399 commits to master branch, last one 3 days ago
seaweedfs implemented in pure Rust
Created
2023-08-20
49 commits to main branch, last one about a month ago
HDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Created
2017-02-09
56 commits to master branch, last one 2 years ago