Search Results - RepositoryStats

290

3.5k

apache-2.0

143

Fast, efficient, and scalable distributed map/reduce system, DAG execution, in memory or on disk, written in pure Go, runs standalone or distributedly.

golang map-reduce distributed-systems distributed-computing

Created 2016-08-26

784 commits to master branch, last one a day ago

poseidon Qihoo360

432

2.0k

bsd-3-clause

153

A search engine which can hold 100 trillion lines of log data.

golang big-data poseidon map-reduce search-engine

Created 2016-10-20

40 commits to master branch, last one 7 years ago

numaflow numaproj

130

1.9k

apache-2.0

25

Kubernetes-native platform to run massively parallel data/streaming jobs

k8s pipeline kubernetes map-reduce hacktoberfest data-processing stream-processing

Created 2022-05-20

1,306 commits to main branch, last one a day ago

Transducers.jl JuliaFolds

25

437

mit

7

Efficient transducers for Julia

julia parallel iterators map-reduce transducers high-performance distributed-computing

Created 2018-12-23

1,020 commits to master branch, last one about a year ago

Spark-with-Python tirthajyoti

271

344

mit

10

Fundamentals of Spark with Python (using PySpark), code examples

sql hdfs mlib spark apache hadoop python pyspark big-data database analytics dataframe map-reduce apache-spark machine-learning parallel-computing distributed-computing

Created 2018-08-20

73 commits to master branch, last one 4 years ago

ThreadsX.jl tkf

10

327

mit

5

Parallelized Base functions

julia parallel map-reduce transducers high-performance sorting-algorithms

Created 2020-02-20

188 commits to master branch, last one 2 years ago

python-bigdata phelps-sg

165

135

unknown

8

Data science and Big Data with Python

hbase numpy spark python map-reduce data-science notebook-jupyter numerical-methods

Created 2016-07-14

31 commits to master branch, last one about a year ago

flox xarray-contrib

18

130

apache-2.0

4

Fast & furious GroupBy operations for dask.array

dask xarray map-reduce

Created 2021-03-24

653 commits to main branch, last one 12 days ago

prosto asavinov

5

90

mit

3

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

olap spark pandas python workflow map-reduce data-science data-wrangling data-processing data-preparation data-preprocessing feature-engineering business-intelligence

Created 2019-07-20

174 commits to master branch, last one 3 years ago

FoldsCUDA.jl JuliaFolds

5

57

mit

6

Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)

gpu cuda julia parallel iterators map-reduce transducers high-performance

Created 2020-10-11

104 commits to master branch, last one 3 years ago