43 results found Sort:

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
Created 2015-01-23
543 commits to master branch, last one 6 years ago
4.3k
16.3k
unknown
448
大数据入门指南 :star:
Created 2019-03-10
607 commits to master branch, last one 2 years ago
1.3k
7.4k
apache-2.0
124
Enterprise job scheduling middleware with distributed computing ability.
Created 2020-03-16
1,267 commits to master branch, last one 4 months ago
530
2.7k
bsd-3-clause
266
Python clone of Spark, a MapReduce alike framework in Python
This repository has been archived (exclude archived)
Created 2012-04-11
1,467 commits to master branch, last one 4 years ago
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Created 2019-08-12
80 commits to master branch, last one 4 years ago
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Created 2020-06-10
1,088 commits to master branch, last one 5 months ago
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Created 2014-08-06
787 commits to master branch, last one 6 months ago
211
940
mit
141
C# and F# language binding and extensions to Apache Spark
Created 2015-10-27
1,083 commits to master branch, last one about a year ago
distributed_computing include mapreduce kvstore etc.
Created 2017-01-16
43 commits to master branch, last one 4 years ago
345
768
other
95
An open source framework for building data analytic applications.
Created 2014-08-02
53,042 commits to develop branch, last one 5 days ago
40
693
mit
21
🐎 A serverless MapReduce framework written for AWS Lambda
Created 2018-04-02
88 commits to master branch, last one 3 years ago
34
553
apache-2.0
25
A serverless cluster computing system for the Go programming language
Created 2019-09-10
340 commits to master branch, last one about a year ago
155
414
apache-2.0
20
Uniffle is a high performance, general purpose Remote Shuffle Service.
Created 2022-06-17
1,334 commits to master branch, last one 3 days ago
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Created 2015-03-16
124 commits to master branch, last one 2 years ago
137
380
apache-2.0
18
Compass is a task diagnosis platform for bigdata
Created 2023-03-29
489 commits to main branch, last one 4 months ago
Dynamic execution framework for your Redis data
Created 2018-10-21
1,131 commits to master branch, last one 11 months ago
221
349
other
33
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Created 2009-02-04
2,544 commits to wip-4.6 branch, last one 5 days ago
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Created 2021-11-18
322 commits to master branch, last one about a year ago
72
255
other
10
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Created 2021-10-29
142 commits to master branch, last one 2 years ago
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers threads on a single machine and multiple processes on a single machine right now.
Created 2021-11-17
43 commits to master branch, last one about a year ago
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Created 2019-12-10
479 commits to master branch, last one about a year ago
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Created 2019-06-22
219 commits to master branch, last one 4 months ago
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
Created 2021-12-31
29 commits to main branch, last one 2 years ago
Java 实现的分布式系统课程(MIT6.824)
Created 2020-12-31
34 commits to main branch, last one 2 years ago
Use the MapReduce's Java interface to distributed crawle the data of Chinese universities and learn basic knowledge of hdfs.
Created 2023-04-10
77 commits to master branch, last one 6 months ago
10
128
apache-2.0
8
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Created 2019-08-24
469 commits to master branch, last one 7 months ago
35
122
apache-2.0
1
Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统
Created 2023-05-20
359 commits to main branch, last one 3 months ago
13
116
apache-2.0
12
Asakusa Framework
Created 2011-03-30
2,946 commits to master branch, last one 4 years ago
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
Created 2016-01-19
107 commits to master branch, last one 10 months ago