Statistics for topic apache-spark
RepositoryStats tracks 595,856 Github repositories, of these 111 are tagged with the apache-spark topic. The most common primary language for repositories using this topic is Scala (28). Other languages include: Python (27), Jupyter Notebook (11)
Stargazers over time for topic apache-spark
Most starred repositories for topic apache-spark (view more)
Trending repositories for topic apache-spark (view more)
lakeFS - Data version control for your data lake | Git for data
A curated list of awesome Apache Spark packages and resources.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Code for "Efficient Data Processing in Spark" Course
Code for "Efficient Data Processing in Spark" Course
A curated list of awesome Apache Spark packages and resources.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
lakeFS - Data version control for your data lake | Git for data
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Code for "Efficient Data Processing in Spark" Course
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...
Code for "Efficient Data Processing in Spark" Course
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
lakeFS - Data version control for your data lake | Git for data
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data ...
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs
Code for "Efficient Data Processing in Spark" Course
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...
Spark fires is a anti-pattern playground where we deliberately break Spark applications in various ways so you can observe what happens and potentially recognise the issue when you come across it in y...
lakeFS - Data version control for your data lake | Git for data
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Code for "Efficient Data Processing in Spark" Course
Code for "Efficient Data Processing in Spark" Course
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...