43 results found Sort:
- Filter by Primary Language:
- Java (14)
- Python (9)
- Go (8)
- HTML (3)
- Scala (2)
- Jupyter Notebook (2)
- Shell (1)
- Elixir (1)
- Rust (1)
- C# (1)
- +
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
Created
2015-01-23
543 commits to master branch, last one 5 years ago
大数据入门指南 :star:
Created
2019-03-10
607 commits to master branch, last one about a year ago
Enterprise job scheduling middleware with distributed computing ability.
Created
2020-03-16
1,267 commits to master branch, last one 14 days ago
Python clone of Spark, a MapReduce alike framework in Python
This repository has been archived
(exclude archived)
Created
2012-04-11
1,467 commits to master branch, last one 3 years ago
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Created
2019-08-12
80 commits to master branch, last one 3 years ago
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Created
2020-06-10
1,088 commits to master branch, last one about a month ago
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Created
2014-08-06
787 commits to master branch, last one 2 months ago
C# and F# language binding and extensions to Apache Spark
Created
2015-10-27
1,083 commits to master branch, last one 10 months ago
distributed_computing include mapreduce kvstore etc.
Created
2017-01-16
43 commits to master branch, last one 4 years ago
An open source framework for building data analytic applications.
Created
2014-08-02
52,872 commits to develop branch, last one a day ago
🐎 A serverless MapReduce framework written for AWS Lambda
Created
2018-04-02
88 commits to master branch, last one 3 years ago
A serverless cluster computing system for the Go programming language
Created
2019-09-10
340 commits to master branch, last one about a year ago
Uniffle is a high performance, general purpose Remote Shuffle Service.
Created
2022-06-17
1,254 commits to master branch, last one a day ago
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Created
2015-03-16
124 commits to master branch, last one 2 years ago
Dynamic execution framework for your Redis data
Created
2018-10-21
1,131 commits to master branch, last one 7 months ago
Compass is a task diagnosis platform for bigdata
Created
2023-03-29
489 commits to main branch, last one 28 days ago
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Created
2009-02-04
2,528 commits to 4.5 branch, last one 6 months ago
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Created
2021-11-18
322 commits to master branch, last one about a year ago
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Created
2021-10-29
142 commits to master branch, last one 2 years ago
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers threads on a single machine and multiple processes on a single machine right now.
Created
2021-11-17
43 commits to master branch, last one about a year ago
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Created
2019-12-10
479 commits to master branch, last one about a year ago
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Created
2019-06-22
219 commits to master branch, last one 10 days ago
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
Created
2021-12-31
29 commits to main branch, last one 2 years ago
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Created
2014-12-04
920 commits to master branch, last one 17 days ago
Java 实现的分布式系统课程(MIT6.824)
Created
2020-12-31
34 commits to main branch, last one 2 years ago
Hadoop, MapReduce Distributed Crawling of Data Information from All Chinese Universities.
Created
2023-04-10
77 commits to master branch, last one 2 months ago
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Created
2019-08-24
469 commits to master branch, last one 4 months ago
Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统
Created
2023-05-20
354 commits to main branch, last one 9 months ago
Asakusa Framework
Created
2011-03-30
2,946 commits to master branch, last one 3 years ago
Teaching Materials for Distributed Statistical Computing (大数据分布式计算教学材料)
Created
2016-01-19
107 commits to master branch, last one 6 months ago