Statistics for topic big-data
RepositoryStats tracks 595,858 Github repositories, of these 363 are tagged with the big-data topic. The most common primary language for repositories using this topic is Java (92). Other languages include: Python (59), Scala (31), Jupyter Notebook (27), C++ (21), Rust (17), JavaScript (15), TypeScript (14), Go (13)
Stargazers over time for topic big-data
Most starred repositories for topic big-data (view more)
Trending repositories for topic big-data (view more)
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
LakeSoul is an end-to-end, realtime and cloud native Lakehouse framework with fast data ingestion, concurrent update and incremental data analytics on cloud storages for both BI and AI applications.
One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
A world wines dataset with user ratings for recommendation systems and general use.
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for...
One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
A @ClickHouse fork that supports high-performance vector search and full-text search.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
Apache Spark - A unified analytics engine for large-scale data processing
A @ClickHouse fork that supports high-performance vector search and full-text search.
Bigtop Manager provides a modern, low-threshold web application to simplify the deployment and management of components for Bigtop, similar to Apache Ambari and Cloudera Manager.