42 results found Sort:

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
Created 2015-01-23
543 commits to master branch, last one 5 years ago
4.2k
15.9k
unknown
447
大数据入门指南 :star:
Created 2019-03-10
607 commits to master branch, last one about a year ago
1.3k
7.2k
apache-2.0
124
Enterprise job scheduling middleware with distributed computing ability.
Created 2020-03-16
1,248 commits to master branch, last one 21 days ago
532
2.7k
bsd-3-clause
267
Python clone of Spark, a MapReduce alike framework in Python
This repository has been archived (exclude archived)
Created 2012-04-11
1,467 commits to master branch, last one 3 years ago
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Created 2019-08-12
80 commits to master branch, last one 3 years ago
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Created 2020-06-10
1,088 commits to master branch, last one 7 days ago
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Created 2014-08-06
787 commits to master branch, last one about a month ago
213
941
mit
143
C# and F# language binding and extensions to Apache Spark
Created 2015-10-27
1,083 commits to master branch, last one 9 months ago
distributed_computing include mapreduce kvstore etc.
Created 2017-01-16
43 commits to master branch, last one 4 years ago
342
761
other
96
An open source framework for building data analytic applications.
Created 2014-08-02
52,836 commits to develop branch, last one a day ago
41
695
mit
21
🐎 A serverless MapReduce framework written for AWS Lambda
Created 2018-04-02
88 commits to master branch, last one 3 years ago
35
551
apache-2.0
26
A serverless cluster computing system for the Go programming language
Created 2019-09-10
340 commits to master branch, last one about a year ago
149
385
apache-2.0
21
Uniffle is a high performance, general purpose Remote Shuffle Service.
Created 2022-06-17
1,235 commits to master branch, last one 2 days ago
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Created 2015-03-16
124 commits to master branch, last one 2 years ago
Dynamic execution framework for your Redis data
Created 2018-10-21
1,131 commits to master branch, last one 6 months ago
136
362
apache-2.0
18
Compass is a task diagnosis platform for bigdata
Created 2023-03-29
487 commits to main branch, last one 3 months ago
221
347
other
34
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Created 2009-02-04
2,528 commits to 4.5 branch, last one 5 months ago
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Created 2021-11-18
322 commits to master branch, last one about a year ago
72
252
other
12
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Created 2021-10-29
142 commits to master branch, last one 2 years ago
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers threads on a single machine and multiple processes on a single machine right now.
Created 2021-11-17
43 commits to master branch, last one about a year ago
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Created 2019-12-10
479 commits to master branch, last one about a year ago
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Created 2019-06-22
216 commits to master branch, last one 4 months ago
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
Created 2021-12-31
29 commits to main branch, last one 2 years ago
Hadoop, MapReduce Distributed Crawling of Data Information from All Chinese Universities.
Created 2023-04-10
77 commits to master branch, last one about a month ago
Java 实现的分布式系统课程(MIT6.824)
Created 2020-12-31
34 commits to main branch, last one 2 years ago
9
126
apache-2.0
9
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Created 2019-08-24
469 commits to master branch, last one 2 months ago
13
116
apache-2.0
13
Asakusa Framework
Created 2011-03-30
2,946 commits to master branch, last one 3 years ago
30
114
apache-2.0
1
Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统
Created 2023-05-20
354 commits to main branch, last one 8 months ago
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
Created 2016-01-19
107 commits to master branch, last one 5 months ago