Statistics for topic big-data
RepositoryStats tracks 639,267 Github repositories, of these 371 are tagged with the big-data topic. The most common primary language for repositories using this topic is Java (94). Other languages include: Python (60), Scala (32), Jupyter Notebook (27), C++ (22), Rust (18), JavaScript (15), Go (14), TypeScript (14)
Stargazers over time for topic big-data
Most starred repositories for topic big-data (view more)
Trending repositories for topic big-data (view more)
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
ClickHouse® is a real-time analytics database management system
Apache Spark - A unified analytics engine for large-scale data processing
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
Wrangler Transform: A DMD system for transforming Big Data
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive (AI) workloads.
ClickHouse® is a real-time analytics database management system
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Apache Spark - A unified analytics engine for large-scale data processing
Distributed data engine for Python/SQL designed for the cloud, powered by Rust
Wrangler Transform: A DMD system for transforming Big Data
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
ClickHouse® is a real-time analytics database management system
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Apache Spark - A unified analytics engine for large-scale data processing
CortexBrain is an ambitious open source project aimed at creating an intelligent, lightweight, and efficient service mesh architecture to seamlessly connect cloud and edge devices
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
High performance data processing employs high performance computing (HPC) to process data, which is then translated into information and knowledge. The advent of high-performance computing and data an...
Use CH-UI to work with your data from Click House self-hosted with a user-friendly interface. CH-UI is a modern and feature-rich user interface for ClickHouse databases. It offers an intuitive platfor...
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
ClickHouse® is a real-time analytics database management system
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
ParadeDB is a modern Elasticsearch alternative built on Postgres. Built for real-time, update-heavy workloads.
Apache Spark - A unified analytics engine for large-scale data processing
Bigtop Manager provides a modern, low-threshold web application to simplify the deployment and management of components for Bigtop, similar to Apache Ambari and Cloudera Manager.
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
Apache Paimon Rust The rust implementation of Apache Paimon.