43 results found Sort:

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
Created 2015-01-23
543 commits to master branch, last one 5 years ago
5.3k
22.9k
apache-2.0
881
Redisson - Easy Redis Java client and Real-Time Data Platform. Valkey compatible. Sync/Async/RxJava/Reactive API. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Qu...
Created 2014-01-11
9,335 commits to master branch, last one 3 days ago
4.2k
15.4k
unknown
443
大数据入门指南 :star:
Created 2019-03-10
607 commits to master branch, last one about a year ago
1.2k
6.6k
apache-2.0
126
Enterprise job scheduling middleware with distributed computing ability.
Created 2020-03-16
1,174 commits to master branch, last one 3 months ago
535
2.7k
bsd-3-clause
267
Python clone of Spark, a MapReduce alike framework in Python
This repository has been archived (exclude archived)
Created 2012-04-11
1,467 commits to master branch, last one 3 years ago
:dart: :star2:[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Created 2019-08-12
80 commits to master branch, last one 3 years ago
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
Created 2020-06-10
1,059 commits to master branch, last one 6 days ago
MapReduce, Spark, Java, and Scala for Data Algorithms Book
Created 2014-08-06
785 commits to master branch, last one 2 years ago
212
940
mit
145
C# and F# language binding and extensions to Apache Spark
Created 2015-10-27
1,083 commits to master branch, last one 4 months ago
distributed_computing include mapreduce kvstore etc.
Created 2017-01-16
43 commits to master branch, last one 3 years ago
338
743
other
97
An open source framework for building data analytic applications.
Created 2014-08-02
52,744 commits to develop branch, last one 28 days ago
41
692
mit
21
🐎 A serverless MapReduce framework written for AWS Lambda
Created 2018-04-02
88 commits to master branch, last one 3 years ago
35
547
apache-2.0
27
A serverless cluster computing system for the Go programming language
Created 2019-09-10
340 commits to master branch, last one about a year ago
t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Created 2015-03-16
124 commits to master branch, last one about a year ago
134
360
apache-2.0
22
Uniffle is a high performance, general purpose Remote Shuffle Service.
Created 2022-06-17
1,001 commits to master branch, last one 18 hours ago
Dynamic execution framework for your Redis data
Created 2018-10-21
1,131 commits to master branch, last one about a month ago
222
342
other
34
Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster.
Created 2009-02-04
2,528 commits to 4.5 branch, last one 9 days ago
120
321
apache-2.0
17
Compass is a task diagnosis platform for bigdata
Created 2023-03-29
482 commits to main branch, last one about a month ago
74
249
other
12
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark and Apache Hadoop MapReduce applications to store shuffle data on remote servers
Created 2021-10-29
142 commits to master branch, last one about a year ago
🎉🎉🐳 Datawhale大数据处理导论教程 | 大数据技术方向的开篇课程🎉🎉
Created 2021-11-18
322 commits to master branch, last one about a year ago
An easy-to-use Map Reduce Go parallel-computing framework inspired by 2021 6.824 lab1. It supports multiple workers threads on a single machine and multiple processes on a single machine right now.
Created 2021-11-17
43 commits to master branch, last one 7 months ago
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Created 2019-12-10
479 commits to master branch, last one 11 months ago
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Created 2019-06-22
212 commits to master branch, last one 3 months ago
A in-process MapReduce library to help you optimizing service response time or concurrent task processing.
Created 2021-12-31
29 commits to main branch, last one about a year ago
Hadoop, MapReduce Distributed Crawling of Data Information from All Chinese Universities.
Created 2023-04-10
76 commits to master branch, last one 7 months ago
Java 实现的分布式系统课程(MIT6.824)
Created 2020-12-31
34 commits to main branch, last one about a year ago
9
124
apache-2.0
9
DTail is a distributed DevOps tool for tailing, grepping, catting logs and other text files on many remote machines at once.
Created 2019-08-24
462 commits to master branch, last one 2 months ago
13
116
apache-2.0
13
Asakusa Framework
Created 2011-03-30
2,946 commits to master branch, last one 3 years ago
27
105
apache-2.0
1
Tangseng search engine including full text search and vector search base on golang. 基于go语言的搜索引擎,信息检索系统
Created 2023-05-20
354 commits to main branch, last one 3 months ago