Trending repositories for topic apache-spark

Last 3 days (new repositories)

no newly created repositories trending in the last 3 days

Last 3 days (absolute gain)

mlflow/mlflow

Open source platform for the machine learning lifecycle

19,057 (+26)

apache-2.0

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

4,503 (+4)

apache-2.0

kubeflow/spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

2,818 (+2)

apache-2.0

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

260 (+2)

airscholar/e2e-data-engineering

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All compone...

209 (+1)

mrpowers-io/quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

653 (+1)

apache-2.0

awesome-spark/awesome-spark

A curated list of awesome Apache Spark packages and resources.

1,736 (+1)

cc0-1.0

Last 3 days (relative gain)

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

260 (+0.8%)

airscholar/e2e-data-engineering

209 (+0.5%)

mrpowers-io/quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

653 (+0.2%)

apache-2.0

mlflow/mlflow

Open source platform for the machine learning lifecycle

19,057 (+0.1%)

apache-2.0

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

4,503 (+0.1%)

apache-2.0

kubeflow/spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

2,818 (+0.1%)

apache-2.0

awesome-spark/awesome-spark

A curated list of awesome Apache Spark packages and resources.

1,736 (+0.1%)

cc0-1.0

Last week (new repositories)

no newly created repositories trending in the last week

Last week (absolute gain)

mlflow/mlflow

Open source platform for the machine learning lifecycle

19,057 (+43)

apache-2.0

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

4,503 (+11)

apache-2.0

san089/goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

1,325 (+6)

mit

aloneguid/parquet-dotnet

Fully managed Apache Parquet implementation

657 (+3)

mit

awesome-spark/awesome-spark

A curated list of awesome Apache Spark packages and resources.

1,736 (+3)

cc0-1.0

kubeflow/spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

2,818 (+3)

apache-2.0

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

260 (+3)

coder2j/pyspark-tutorial

PySpark Tutorial for Beginners - Practical Examples in Jupyter Notebook with Spark version 3.4.1. The tutorial covers various topics like Spark Introduction, Spark Installation, Spark RDD Transformati...

88 (+2)

mit

airscholar/e2e-data-engineering

209 (+2)

mrpowers-io/quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

653 (+2)

apache-2.0

microsoft/SynapseML

Simple and Distributed Machine Learning

5,086 (+2)

mit

infoslack/awesome-kafka

A list about Apache Kafka

579 (+1)

databricks/LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

1,214 (+1)

apache-2.0

ptyadana/SQL-Data-Analysis-and-Visualization-Projects

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

1,303 (+1)

mit

japila-books/apache-spark-internals

The Internals of Apache Spark

1,486 (+1)

apache-2.0

feathr-ai/feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

1,987 (+1)

apache-2.0

dotnet/spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

2,032 (+1)

mit

big-data-europe/docker-spark

Apache Spark docker image

2,045 (+1)

Last week (relative gain)

coder2j/pyspark-tutorial

88 (+2%)

mit

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

260 (+1%)

airscholar/e2e-data-engineering

209 (+1.0%)

aloneguid/parquet-dotnet

Fully managed Apache Parquet implementation

657 (+0.5%)

mit

san089/goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

1,325 (+0.5%)

mit

mrpowers-io/quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

653 (+0.3%)

apache-2.0

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

4,503 (+0.2%)

apache-2.0

mlflow/mlflow

Open source platform for the machine learning lifecycle

19,057 (+0.2%)

apache-2.0

awesome-spark/awesome-spark

A curated list of awesome Apache Spark packages and resources.

1,736 (+0.2%)

cc0-1.0

infoslack/awesome-kafka

A list about Apache Kafka

579 (+0.2%)

kubeflow/spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

2,818 (+0.1%)

apache-2.0

databricks/LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

1,214 (+0.1%)

apache-2.0

ptyadana/SQL-Data-Analysis-and-Visualization-Projects

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

1,303 (+0.1%)

mit

japila-books/apache-spark-internals

The Internals of Apache Spark

1,486 (+0.1%)

apache-2.0

feathr-ai/feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

1,987 (+0.1%)

apache-2.0

dotnet/spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

2,032 (+0.0%)

mit

big-data-europe/docker-spark

Apache Spark docker image

2,045 (+0.0%)

microsoft/SynapseML

Simple and Distributed Machine Learning

5,086 (+0.0%)

mit

Last month (new repositories)

no newly created repositories trending in the last month

Last month (absolute gain)

mlflow/mlflow

Open source platform for the machine learning lifecycle

19,057 (+227)

apache-2.0

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

4,503 (+40)

apache-2.0

san089/goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

1,325 (+32)

mit

aloneguid/parquet-dotnet

Fully managed Apache Parquet implementation

657 (+20)

mit

kubeflow/spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

2,818 (+20)

apache-2.0

microsoft/SynapseML

Simple and Distributed Machine Learning

5,086 (+18)

mit

ptyadana/SQL-Data-Analysis-and-Visualization-Projects

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

1,303 (+13)

mit

awesome-spark/awesome-spark

A curated list of awesome Apache Spark packages and resources.

1,736 (+13)

cc0-1.0

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

260 (+13)

databricks/LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

1,214 (+10)

apache-2.0

mrpowers-io/quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

653 (+9)

apache-2.0

LucaCanali/sparkMeasure

This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of...

715 (+6)

apache-2.0

japila-books/apache-spark-internals

The Internals of Apache Spark

1,486 (+6)

apache-2.0

cartershanklin/pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

430 (+5)

cc0-1.0

dotnet/spark

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

2,032 (+5)

mit

coder2j/pyspark-tutorial

88 (+4)

mit

airscholar/e2e-data-engineering

209 (+4)

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question

⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.

43 (+3)

lynnlangit/learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

187 (+3)

apache-2.0

dataflint/spark

Performance Observability for Apache Spark

203 (+3)

apache-2.0

Last month (relative gain)

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question

⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.

43 (+8%)

airscholar/RealtimeStreamingEngineering

This project serves as a comprehensive guide to building an end-to-end data engineering pipeline using TCP/IP Socket, Apache Spark, OpenAI LLM, Kafka and Elasticsearch. It covers each stage from data ...

32 (+7%)

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

260 (+5%)

awslabs/amazon-emr-cli

A command-line interface for packaging, deploying, and running your EMR Serverless Spark jobs

41 (+5%)

apache-2.0

coder2j/pyspark-tutorial

88 (+5%)

mit

openucx/sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

47 (+4%)

bsd-3-clause

aloneguid/parquet-dotnet

Fully managed Apache Parquet implementation

657 (+3%)

mit

san089/goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

1,325 (+2%)

mit

airscholar/e2e-data-engineering

209 (+2%)

lynnlangit/learning-hadoop-and-spark

Companion to Learning Hadoop and Learning Spark courses on Linked In Learning

187 (+2%)

apache-2.0

dataflint/spark

Performance Observability for Apache Spark

203 (+2%)

apache-2.0

mrpowers-io/quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

653 (+1%)

apache-2.0

radoslawkrolikowski/financial-market-data-analysis

Real-Time Financial Market Data Processing and Prediction application

82 (+1%)

mlflow/mlflow

Open source platform for the machine learning lifecycle

19,057 (+1%)

apache-2.0

cartershanklin/pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

430 (+1%)

cc0-1.0

dimajix/flowman

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

94 (+1%)

apache-2.0

ptyadana/SQL-Data-Analysis-and-Visualization-Projects

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

1,303 (+1%)

mit

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

4,503 (+0.9%)

apache-2.0

tirthajyoti/Spark-with-Python

Fundamentals of Spark with Python (using PySpark), code examples

339 (+0.9%)

mit

LucaCanali/sparkMeasure

715 (+0.8%)

apache-2.0

Last 12-months (new repositories)

owenrh/spark-fires

Spark fires is a anti-pattern playground where we deliberately break Spark applications in various ways so you can observe what happens and potentially recognise the issue when you come across it in y...

Last 12-months (absolute gain)

mlflow/mlflow

Open source platform for the machine learning lifecycle

19,057 (+2,946)

apache-2.0

treeverse/lakeFS

lakeFS - Data version control for your data lake | Git for data

4,503 (+664)

apache-2.0

ptyadana/SQL-Data-Analysis-and-Visualization-Projects

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.

1,303 (+346)

mit

kubeflow/spark-operator

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

2,818 (+318)

apache-2.0

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

260 (+256)

aloneguid/parquet-dotnet

Fully managed Apache Parquet implementation

657 (+217)

mit

dataflint/spark

Performance Observability for Apache Spark

203 (+198)

apache-2.0

microsoft/SynapseML

Simple and Distributed Machine Learning

5,086 (+196)

mit

airscholar/e2e-data-engineering

209 (+184)

awesome-spark/awesome-spark

A curated list of awesome Apache Spark packages and resources.

1,736 (+178)

cc0-1.0

databricks/LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

1,214 (+171)

apache-2.0

san089/goodreads_etl_pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

1,325 (+140)

mit

cartershanklin/pyspark-cheatsheet

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

430 (+118)

cc0-1.0

mrpowers-io/quinn

pyspark methods to enhance developer productivity 📣 👯 🎉

653 (+112)

apache-2.0

feathr-ai/feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

1,987 (+107)

apache-2.0

LucaCanali/sparkMeasure

715 (+102)

apache-2.0

japila-books/apache-spark-internals

The Internals of Apache Spark

1,486 (+87)

apache-2.0

coder2j/pyspark-tutorial

88 (+81)

mit

intel-analytics/BigDL-2.x

BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray

2,668 (+74)

apache-2.0

big-data-europe/docker-spark

Apache Spark docker image

2,045 (+71)

Last 12-months (relative gain)

josephmachado/efficient_data_processing_spark

Code for "Efficient Data Processing in Spark" Course

260 (+6,400%)

dataflint/spark

Performance Observability for Apache Spark

203 (+3,960%)

apache-2.0

coder2j/pyspark-tutorial

88 (+1,157%)

mit

Ditectrev/Amazon-Web-Services-Certified-AWS-Certified-Machine-Learning-MLS-C01-Practice-Tests-Exams-Question

⛳️ PASS: Amazon Web Services Certified (AWS Certified) Machine Learning Specialty (MLS-C01) by learning based on our Questions & Answers (Q&A) Practice Tests Exams.

43 (+975%)

airscholar/e2e-data-engineering

209 (+736%)

airscholar/changecapture-e2e

This project shows how to capture changes from postgres database and stream them into kafka

31 (+417%)

Joshua-omolewa/Stock_streaming_pipeline_project

Built a real-time streaming pipeline to extract stock data, using Apache Nifi, Debezium, Kafka, and Spark Streaming. Loaded the transformed data into Glue database and created real-time dashboards usi...

25 (+178%)