Statistics for topic pyspark
RepositoryStats tracks 616,864 Github repositories, of these 115 are tagged with the pyspark topic. The most common primary language for repositories using this topic is Python (54). Other languages include: Jupyter Notebook (29)
Stargazers over time for topic pyspark
Most starred repositories for topic pyspark (view more)
Trending repositories for topic pyspark (view more)
Lightweight and extensible compatibility layer between dataframe libraries!
Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake
Implementing best practices for PySpark ETL jobs and applications.
Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake
Lightweight and extensible compatibility layer between dataframe libraries!
Implementing best practices for PySpark ETL jobs and applications.
Hopsworks - Data-Intensive AI platform with a Feature Store
Lightweight and extensible compatibility layer between dataframe libraries!
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
Lightweight and extensible compatibility layer between dataframe libraries!
A library that provides useful extensions to Apache Spark and PySpark.
Lightweight and extensible compatibility layer between dataframe libraries!
SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Implementing best practices for PySpark ETL jobs and applications.
Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!
Detailed notes and material from 2025 Data Engineering Zoomcamp by Datatalks.Club
Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake
This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.
Lightweight and extensible compatibility layer between dataframe libraries!
Data Engineering examples for Airflow, Prefect, and Mage.ai; dbt for BigQuery, Redshift, ClickHouse, PostgreSQL; Spark/PySpark for Batch processing; and Kafka for Stream processing
Lightweight and extensible compatibility layer between dataframe libraries!
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Open Source LeetCode for PySpark, Spark, Pandas and DBT/Snowflake
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
Detailed notes and material from 2025 Data Engineering Zoomcamp by Datatalks.Club
Lightweight and extensible compatibility layer between dataframe libraries!
LakeSail's computation framework with a mission to unify batch processing, stream processing, and compute-intensive (AI) workloads.
Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipelines from simple, reusable components.
Implementing best practices for PySpark ETL jobs and applications.
Code for "Efficient Data Processing in Spark" Course
This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.
PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...
This project introduces PySpark, a powerful open-source framework for distributed data processing. We explore its architecture, components, and applications for real-time data analysis.