Statistics for topic apache-spark
RepositoryStats tracks 584,797 Github repositories, of these 109 are tagged with the apache-spark topic. The most common primary language for repositories using this topic is Python (27). Other languages include: Scala (27), Jupyter Notebook (11)
Stargazers over time for topic apache-spark
Most starred repositories for topic apache-spark (view more)
Trending repositories for topic apache-spark (view more)
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
Code for "Efficient Data Processing in Spark" Course
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of...
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
Code for "Efficient Data Processing in Spark" Course
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of...
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
lakeFS - Data version control for your data lake | Git for data
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
Spark fires is a anti-pattern playground where we deliberately break Spark applications in various ways so you can observe what happens and potentially recognise the issue when you come across it in y...
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
Code for "Efficient Data Processing in Spark" Course
lakeFS - Data version control for your data lake | Git for data
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Spark fires is a anti-pattern playground where we deliberately break Spark applications in various ways so you can observe what happens and potentially recognise the issue when you come across it in y...
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...
This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgres and Streamlit. The system is built using Docker Compose to e...
Code for "Efficient Data Processing in Spark" Course
Code for "Efficient Data Processing in Spark" Course
Spark fires is a anti-pattern playground where we deliberately break Spark applications in various ways so you can observe what happens and potentially recognise the issue when you come across it in y...
This project shows how to capture changes from postgres database and stream them into kafka
This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgres and Streamlit. The system is built using Docker Compose to e...
lakeFS - Data version control for your data lake | Git for data
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Code for "Efficient Data Processing in Spark" Course
Code for "Efficient Data Processing in Spark" Course
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.