Statistics for topic big-data
RepositoryStats tracks 592,321 Github repositories, of these 362 are tagged with the big-data topic. The most common primary language for repositories using this topic is Java (92). Other languages include: Python (57), Scala (31), Jupyter Notebook (27), C++ (21), Rust (17), JavaScript (15), TypeScript (14), Go (13)
Stargazers over time for topic big-data
Most starred repositories for topic big-data (view more)
Trending repositories for topic big-data (view more)
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
Use CH-UI to work with your data from Click House self-hosted with a user-friendly interface. CH-UI is a modern and feature-rich user interface for ClickHouse databases. It offers an intuitive platfor...
Collaborative Datacenter Simulation and Exploration for Everybody
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for...
YTsaurus is a scalable and fault-tolerant open-source big data platform.
Apache Wayang(incubating) is the first cross-platform data processing system.
Use CH-UI to work with your data from Click House self-hosted with a user-friendly interface. CH-UI is a modern and feature-rich user interface for ClickHouse databases. It offers an intuitive platfor...
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for...
Apache Spark - A unified analytics engine for large-scale data processing
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.
A @ClickHouse fork that supports high-performance vector search and full-text search.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
Apache Spark - A unified analytics engine for large-scale data processing
A @ClickHouse fork that supports high-performance vector search and full-text search.
Bigtop Manager provides a modern, low-threshold web application to simplify the deployment and management of components for Bigtop, similar to Apache Ambari and Cloudera Manager.