Trending repositories for topic big-data
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Apache Spark - A unified analytics engine for large-scale data processing
QuestDB is an open source time-series database for fast ingest and SQL queries
The official home of the Presto distributed SQL query engine for big data
Apache Beam is a unified programming model for Batch and Streaming data processing.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Apache DataFusion Ballista Distributed Query Engine
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Un repositorio más con conceptos básicos, desafíos técnicos y recursos sobre ingeniería de datos en español 🧙✨
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
Apache DataFusion Ballista Distributed Query Engine
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
Distributed DataFrame for Python designed for the cloud, powered by Rust
An open-source, high-performance SQL vector database built on ClickHouse.
📙 Awesome Data Catalogs and Observability Platforms.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
QuestDB is an open source time-series database for fast ingest and SQL queries
The official home of the Presto distributed SQL query engine for big data
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
Apache Spark - A unified analytics engine for large-scale data processing
Distributed DataFrame for Python designed for the cloud, powered by Rust
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
Un repositorio más con conceptos básicos, desafíos técnicos y recursos sobre ingeniería de datos en español 🧙✨
Distributed DataFrame for Python designed for the cloud, powered by Rust
ParquetSharp is a .NET library for reading and writing Apache Parquet files.
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
An open-source, high-performance SQL vector database built on ClickHouse.
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
📙 Awesome Data Catalogs and Observability Platforms.
Apache Wayang(incubating) is the first cross-platform data processing system.
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
Apache Spark - A unified analytics engine for large-scale data processing
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
QuestDB is an open source time-series database for fast ingest and SQL queries
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Distributed DataFrame for Python designed for the cloud, powered by Rust
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Apache Paimon Rust The rust implementation of Apache Paimon.
Bigtop Manager provides a modern, low-threshold web application to simplify the deployment and management of components for Bigtop, similar to Apache Ambari and Cloudera Manager.
Un repositorio más con conceptos básicos, desafíos técnicos y recursos sobre ingeniería de datos en español 🧙✨
A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
A lightweight helper utility which allows developers to do interactive pipeline development by having a unified source code for both DLT run and Non-DLT interactive notebook run.
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
XL-LightHouse是一套支持超大数据量、支持超高并发的通用型流式大数据统计系统【同时支持单机版】。常见的应用场景包括:PV、UV统计;电商销售额、下单用户数统计;日志量统计;接口调用量、异常量、耗时情况统计;服务器运维指标监控等功能。系统支持多维度统计,支持各种复杂的条件筛选和逻辑判断,一键部署,一行代码接入,轻松实现各种海量数据实时统计,帮助企业以更低的成本快速搭建起数据指标体系,是企业...
This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsea...
A virtual scrolling list component that can be sorted by dragging, for vue3
🦖 A SQL-on-everything Query Engine you can execute over multiple databases and file formats. Query your data, where it lives.
An open-source, high-performance SQL vector database built on ClickHouse.
Un repositorio más con conceptos básicos, desafíos técnicos y recursos sobre ingeniería de datos en español 🧙✨
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
High performance data processing employs high performance computing (HPC) to process data, which is then translated into information and knowledge. The advent of high-performance computing and data an...
Course covers big data fundamentals, processes, technologies, platform ecosystem, and management for practical application development.
Bigtop Manager provides a modern, low-threshold web application to simplify the deployment and management of components for Bigtop, similar to Apache Ambari and Cloudera Manager.
This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessary infrastructure components, including Apache Flink, Elasticsea...
The Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
Apache Spark - A unified analytics engine for large-scale data processing
QuestDB is an open source time-series database for fast ingest and SQL queries
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming ...
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
Distributed DataFrame for Python designed for the cloud, powered by Rust
A distributed, fast open-source graph database featuring horizontal scalability and high availability
An open-source, high-performance SQL vector database built on ClickHouse.
Bigtop Manager provides a modern, low-threshold web application to simplify the deployment and management of components for Bigtop, similar to Apache Ambari and Cloudera Manager.
XL-LightHouse是一套支持超大数据量、支持超高并发的通用型流式大数据统计系统【同时支持单机版】。常见的应用场景包括:PV、UV统计;电商销售额、下单用户数统计;日志量统计;接口调用量、异常量、耗时情况统计;服务器运维指标监控等功能。系统支持多维度统计,支持各种复杂的条件筛选和逻辑判断,一键部署,一行代码接入,轻松实现各种海量数据实时统计,帮助企业以更低的成本快速搭建起数据指标体系,是企业...
Apache Paimon Rust The rust implementation of Apache Paimon.
A curated list of awesome Online Analytical Processing databases, frameworks, ressources and other awesomeness.
A virtual scrolling list component that can be sorted by dragging, for vue3
Blazing-fast query execution engine speaks Apache Spark language and has Arrow-DataFusion at its core.
ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.
Distributed DataFrame for Python designed for the cloud, powered by Rust