Statistics for topic spark
RepositoryStats tracks 663,734 Github repositories, of these 563 are tagged with the spark topic. The most common primary language for repositories using this topic is Scala (145). Other languages include: Python (107), Java (104), Jupyter Notebook (67), JavaScript (18), Shell (12), Go (11)
Stargazers over time for topic spark
Most starred repositories for topic spark (view more)
Trending repositories for topic spark (view more)
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Apache Spark - A unified analytics engine for large-scale data processing
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
This repo contains Big Data Project, its about "Real Time Twitter Sentiment Analysis via Kafka, Spark Streaming, MongoDB and Django Dashboard".
💭 一个可二次开发 Chat Bot 单轮对话 Web 端 MVP 原型模板, 基于 Vue 3, Vite 6, TypeScript, Naive UI, Pinia(v3), UnoCSS 等主流技术构建, 🧤简单集成大模型 API, 采用单轮 AI 问答对话模式, 每次提问独立响应, 无需上下文, 支持打字机效果流式输出, 集成 markdown-it Mermaid/KaTex/L...
Open source project for data preparation for GenAI applications
WeDataSphere is a financial grade, one-stop big data platform suite.
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Apache Spark - A unified analytics engine for large-scale data processing
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
This repo contains Big Data Project, its about "Real Time Twitter Sentiment Analysis via Kafka, Spark Streaming, MongoDB and Django Dashboard".
💭 一个可二次开发 Chat Bot 单轮对话 Web 端 MVP 原型模板, 基于 Vue 3, Vite 6, TypeScript, Naive UI, Pinia(v3), UnoCSS 等主流技术构建, 🧤简单集成大模型 API, 采用单轮 AI 问答对话模式, 每次提问独立响应, 无需上下文, 支持打字机效果流式输出, 集成 markdown-it Mermaid/KaTex/L...
Create a streaming data, transfer it to Kafka, modify it with PySpark, take it to ElasticSearch and MinIO
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Apache Spark - A unified analytics engine for large-scale data processing
Apache Doris is an easy-to-use, high performance and unified analytics database.
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AW...
📈 A scalable, production-ready data pipeline for real-time streaming & batch processing, integrating Kafka, Spark, Airflow, AWS, Kubernetes, and MLflow. Supports end-to-end data ingestion, transforma...
💭 一个可二次开发 Chat Bot 单轮对话 Web 端 MVP 原型模板, 基于 Vue 3, Vite 6, TypeScript, Naive UI, Pinia(v3), UnoCSS 等主流技术构建, 🧤简单集成大模型 API, 采用单轮 AI 问答对话模式, 每次提问独立响应, 无需上下文, 支持打字机效果流式输出, 集成 markdown-it Mermaid/KaTex/L...
💭 一个可二次开发 Chat Bot 单轮对话 Web 端 MVP 原型模板, 基于 Vue 3, Vite 6, TypeScript, Naive UI, Pinia(v3), UnoCSS 等主流技术构建, 🧤简单集成大模型 API, 采用单轮 AI 问答对话模式, 每次提问独立响应, 无需上下文, 支持打字机效果流式输出, 集成 markdown-it Mermaid/KaTex/L...
Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake
PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
Data Engineering Zoomcamp is a free nine-week course that covers the fundamentals of data engineering.
Apache Spark - A unified analytics engine for large-scale data processing
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Apache Doris is an easy-to-use, high performance and unified analytics database.
企业级 LLM API 快速集成系统,支持OpenAI、Azure、文心一言、讯飞星火、通义千问、智谱GLM、Gemini、DeepSeek、Anthropic Claude以及OpenAI格式的模型等,简洁的页面风格,轻量高效且稳定,支持Docker一键部署。
Open source project for data preparation for GenAI applications
This is a repository to demonstrate my details, skills, projects and to keep track of my progression in Data Analytics and Data Science topics.
Big data computing platform based on Spark <至轻云-超轻量级大数据计算平台/数据中心/主数据>