Statistics for topic spark
RepositoryStats tracks 518,986 Github repositories, of these 497 are tagged with the spark topic. The most common primary language for repositories using this topic is Scala (131). Other languages include: Java (95), Python (91), Jupyter Notebook (55), JavaScript (15), Shell (12)
Stargazers over time for topic spark
Most starred repositories for topic spark (view more)
Trending repositories for topic spark (view more)
cube studio开源云原生一站式机器学习/深度学习AI平台,支持sso登录,多租户/多项目组,大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU,边缘计算,serverless,标注平台,自动化标注,数据集管理,大模型微调,vllm大模型推理,llmops,私有知识库,AI模型应用商店,支持模型一键开发/推理/微调,支持...
Apache Doris is an easy-to-use, high performance and unified analytics database.
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Apache Spark - A unified analytics engine for large-scale data processing
智元 IIM 是一款开源的网页版即时聊天系统, 同时拥有AI聊天对话功能, 支持ChatGPT、Midjourney、文心一言、讯飞星火、通义千问等AI助手功能
🚀 讯飞星火大模型逆向API白嫖测试【特长:办公助手】,支持高速流式输出、智能体对话、联网搜索、AI绘图、长文档解读、图像解析、多轮对话,零配置部署,多路token支持,自动清理会话痕迹。
Spark-Dashboard is a solution for monitoring Apache Spark jobs. This repository provides the tooling and configuration for deploying an Apache Spark Performance Dashboard using containers technology.
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Prod...
cube studio开源云原生一站式机器学习/深度学习AI平台,支持sso登录,多租户/多项目组,大数据平台对接,notebook在线开发,拖拉拽任务流pipeline编排,多机多卡分布式训练,超参搜索,推理服务VGPU,边缘计算,serverless,标注平台,自动化标注,数据集管理,大模型微调,vllm大模型推理,llmops,私有知识库,AI模型应用商店,支持模型一键开发/推理/微调,支持...
Apache Spark - A unified analytics engine for large-scale data processing
🧙 Build, run, and manage data pipelines for integrating and transforming data.
智元 IIM 是一款开源的网页版即时聊天系统, 同时拥有AI聊天对话功能, 支持ChatGPT、Midjourney、文心一言、讯飞星火、通义千问等AI助手功能
🚀 讯飞星火大模型逆向API白嫖测试【特长:办公助手】,支持高速流式输出、智能体对话、联网搜索、AI绘图、长文档解读、图像解析、多轮对话,零配置部署,多路token支持,自动清理会话痕迹。
More than 2000+ Data engineer interview questions.
End to end data engineering project with kafka, airflow, spark, postgres and docker.
🚀 讯飞星火大模型逆向API白嫖测试【特长:办公助手】,支持高速流式输出、智能体对话、联网搜索、AI绘图、长文档解读、图像解析、多轮对话,零配置部署,多路token支持,自动清理会话痕迹。
Apache Spark - A unified analytics engine for large-scale data processing
Apache Doris is an easy-to-use, high performance and unified analytics database.
End to end data engineering project with kafka, airflow, spark, postgres and docker.
Code/Notes for the Data Engineering Zoomcamp by DataTalksClub
🎨 UI for the Free Data Engineering Zoomcamp Course provided by DataTalksClub
Sample project to demonstrate data engineering best practices
PilotScope is a middleware to bridge the gaps of deploying AI4DB (Artificial Intelligence for Databases) algorithms into actual database systems.
GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.
Apache Doris is an easy-to-use, high performance and unified analytics database.
Apache Spark - A unified analytics engine for large-scale data processing
🧙 Build, run, and manage data pipelines for integrating and transforming data.
Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
XL-LightHouse是一套支持超大数据量、支持超高并发的通用型流式大数据统计系统。常见的应用场景包括:PV、UV统计;电商销售额、下单用户数统计;日志量统计;接口调用量、异常量、耗时情况统计;服务器运维指标监控等功能。系统支持多维度统计,支持各种复杂的条件筛选和逻辑判断,一键部署,一行代码接入,轻松实现各种海量数据实时统计,帮助企业以更低的成本快速搭建起数据指标体系,是企业降本增效的好帮手!
Know your data better!Datavines is Next-gen Data Observability Platform, support metadata manage and data quality.
The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center ...
A Python package to submit and manage Apache Spark applications on Kubernetes.