Trending repositories for topic apache-spark
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Code for "Efficient Data Processing in Spark" Course
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Code for "Efficient Data Processing in Spark" Course
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
lakeFS - Data version control for your data lake | Git for data
Code for "Efficient Data Processing in Spark" Course
A curated list of awesome Apache Spark packages and resources.
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Feathr – A scalable, unified data and AI engineering platform for enterprise
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Code for "Efficient Data Processing in Spark" Course
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
A curated list of awesome Apache Spark packages and resources.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Feathr – A scalable, unified data and AI engineering platform for enterprise
lakeFS - Data version control for your data lake | Git for data
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
lakeFS - Data version control for your data lake | Git for data
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
A curated list of awesome Apache Spark packages and resources.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter noteboo...
Code for "Efficient Data Processing in Spark" Course
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of...
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...
In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data ...
This project shows how to capture changes from postgres database and stream them into kafka
Code for "Efficient Data Processing in Spark" Course
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Includes notes on using Apache Spark in general, notes on using Spark for Physics, how to run TPCDS on PySpark, how to create histograms with Spark, tools for performance testing CPUs, Jupyter noteboo...
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Spark in Action, 2nd edition - chapter 1 - Introduction
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of...
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
lakeFS - Data version control for your data lake | Git for data
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Code for "Efficient Data Processing in Spark" Course
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
A curated list of awesome Apache Spark packages and resources.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of...
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...
BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray
GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs
Code for "Efficient Data Processing in Spark" Course
⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...
This repository contains the code for a realtime election voting system. The system is built using Python, Kafka, Spark Streaming, Postgres and Streamlit. The system is built using Docker Compose to e...
In this project, we setup and end to end data engineering using Apache Spark, Azure Databricks, Data Build Tool (DBT) using Azure as our cloud provider.
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...
This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data ...
This project shows how to capture changes from postgres database and stream them into kafka
Built a real-time streaming pipeline to extract stock data, using Apache Nifi, Debezium, Kafka, and Spark Streaming. Loaded the transformed data into Glue database and created real-time dashboards usi...
Apache Tomcat exploit and Pentesting guide for penetration tester
A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Real-Time Financial Market Data Processing and Prediction application
A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs