36 results found Sort:
- Filter by Primary Language:
- Scala (15)
- Python (6)
- Jupyter Notebook (4)
- C# (2)
- Dockerfile (2)
- Java (1)
- R (1)
- Rust (1)
- HTML (1)
- JavaScript (1)
- +
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Created
2013-10-28
7,827 commits to master branch, last one a day ago
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Created
2017-12-18
4,156 commits to master branch, last one 2 days ago
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Created
2019-04-22
372 commits to main branch, last one about a year ago
A Scala kernel for Jupyter
Created
2015-03-10
1,582 commits to main branch, last one 13 days ago
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Created
2021-12-06
4,099 commits to main branch, last one 5 hours ago
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Created
2019-02-10
184 commits to master branch, last one 2 years ago
电商用户行为分析大数据平台
Created
2018-06-21
45 commits to master branch, last one 5 years ago
Qubole Sparklens tool for performance tuning Apache Spark
Created
2018-03-16
54 commits to master branch, last one 3 years ago
The Internals of Spark SQL
Created
2017-12-26
1,545 commits to main branch, last one 2 months ago
🐍 Quick reference guide to common patterns & functions in PySpark.
Created
2019-03-07
32 commits to master branch, last one about a year ago
New Generation Opensource Data Stack Demo
Created
2022-07-03
57 commits to main branch, last one about a year ago
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsigh...
Created
2019-03-14
960 commits to master branch, last one about a month ago
Use SQL to build ELT pipelines on a data lakehouse.
Created
2021-03-11
481 commits to main branch, last one 2 years ago
Apache Spark™ and Scala Workshops
Created
2016-03-10
318 commits to gh-pages branch, last one 2 years ago
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Created
2021-09-23
1,082 commits to main branch, last one 14 hours ago
Spark Structured Streaming / Kafka / Cassandra / Elastic
Created
2017-06-15
25 commits to master branch, last one 6 years ago
An encrypted data analytics platform
Created
2016-10-31
675 commits to master branch, last one about a year ago
Apache Spark 3 - Structured Streaming Course Material
Created
2020-07-21
29 commits to master branch, last one 4 years ago
Spark Connector to read and write with Pulsar
Created
2019-07-01
192 commits to master branch, last one 6 months ago
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Created
2021-06-27
18 commits to hudi branch, last one 2 years ago
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Created
2018-03-26
60 commits to master branch, last one 3 years ago
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We wil...
Created
2022-05-10
45 commits to master branch, last one 2 years ago
Apache Spark Connect Client for Rust
Created
2023-09-18
81 commits to main branch, last one 6 days ago
Apache Spark Course Material
Created
2020-05-05
34 commits to master branch, last one 4 years ago
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Created
2019-08-27
350 commits to master branch, last one 15 days ago
New generation opensource data stack
Created
2022-05-20
8 commits to main branch, last one 2 years ago
bring sf to spark in production
Created
2019-01-11
143 commits to master branch, last one 2 years ago
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that re...
parquet
consumer
spark-sql
streaming
spark-joins
spark-datadog
spark-mangodb
kafka-producer
spark-use-cases
spark-dataframes
spark-catalog-api
spark-hive-context
spark-with-mangodb
spark-streaming-data
spark-jdbc-connection
spark-transformations
cassandra-installation
spark-kafka-integration
spark-to-cassandra-connection
spark-aggregations-using-dataframe
Created
2016-05-04
191 commits to master branch, last one 2 years ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created
2019-11-16
15 commits to master branch, last one about a year ago
尚硅谷大数据Spark-2019版最新 Spark 学习
Created
2019-08-24
39 commits to master branch, last one 2 years ago