Statistics for topic spark
RepositoryStats tracks 584,796 Github repositories, of these 536 are tagged with the spark topic. The most common primary language for repositories using this topic is Scala (136). Other languages include: Java (103), Python (99), Jupyter Notebook (60), JavaScript (17), Shell (13)
Stargazers over time for topic spark
Most starred repositories for topic spark (view more)
Trending repositories for topic spark (view more)
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Apache Spark - A unified analytics engine for large-scale data processing
Apache Doris is an easy-to-use, high performance and unified analytics database.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Entity Matching Model solves the problem of matching company names between two possibly very large datasets.
🎹 Moodify - an emotion-based music recommendation system that uses AI/ML models to analyze text, speech, and facial expressions, providing personalized music recommendations across web and mobile pla...
Open source project for data preparation of LLM application builders
XL-LightHouse是一套支持超大数据量、支持超高并发的通用型流式大数据统计系统【同时支持单机版】。常见的应用场景包括:PV、UV统计;电商销售额、下单用户数统计;日志量统计;接口调用量、异常量、耗时情况统计;服务器运维指标监控等功能。系统支持多维度统计,支持各种复杂的条件筛选和逻辑判断,一键部署,一行代码接入,轻松实现各种海量数据实时统计,帮助企业以更低的成本快速搭建起数据指标体系,是企业...
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Apache Spark - A unified analytics engine for large-scale data processing
Apache Doris is an easy-to-use, high performance and unified analytics database.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Open source project for data preparation of LLM application builders
🎹 Moodify - an emotion-based music recommendation system that uses AI/ML models to analyze text, speech, and facial expressions, providing personalized music recommendations across web and mobile pla...
💭 一个可二次开发 Chat Bot 对话 Web 端 MVP 原型模板, 基于 Vue3、Vite 5、TypeScript、Naive UI 、UnoCSS 等主流技术构建, 🧤简单集成大模型 API, 采用单轮 AI 问答对话模式, 每次提问独立响应, 无需上下文, 支持打字机效果流式输出, 集成 markdown-it 预览, 💼 易于定制和快速搭建 Chat 类大语言模型产品 (...
Entity Matching Model solves the problem of matching company names between two possibly very large datasets.
Apache Spark - A unified analytics engine for large-scale data processing
Apache Doris is an easy-to-use, high performance and unified analytics database.
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
💭 一个可二次开发 Chat Bot 对话 Web 端 MVP 原型模板, 基于 Vue3、Vite 5、TypeScript、Naive UI 、UnoCSS 等主流技术构建, 🧤简单集成大模型 API, 采用单轮 AI 问答对话模式, 每次提问独立响应, 无需上下文, 支持打字机效果流式输出, 集成 markdown-it 预览, 💼 易于定制和快速搭建 Chat 类大语言模型产品 (...
Open source project for data preparation of LLM application builders
Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake
🎹 Moodify - an emotion-based music recommendation system that uses AI/ML models to analyze text, speech, and facial expressions, providing personalized music recommendations across web and mobile pla...
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
LakeSail's computation framework with a mission to unify stream processing, batch processing, and compute-intensive (AI) workloads.
Open source project for data preparation of LLM application builders
Apache Spark - A unified analytics engine for large-scale data processing
Apache Doris is an easy-to-use, high performance and unified analytics database.
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
XL-LightHouse是一套支持超大数据量、支持超高并发的通用型流式大数据统计系统【同时支持单机版】。常见的应用场景包括:PV、UV统计;电商销售额、下单用户数统计;日志量统计;接口调用量、异常量、耗时情况统计;服务器运维指标监控等功能。系统支持多维度统计,支持各种复杂的条件筛选和逻辑判断,一键部署,一行代码接入,轻松实现各种海量数据实时统计,帮助企业以更低的成本快速搭建起数据指标体系,是企业...
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
A Python package to submit and manage Apache Spark applications on Kubernetes.
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...