36 results found Sort:

4.3k
25.2k
bsd-2-clause
576
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Created 2013-10-28
7,751 commits to master branch, last one 11 days ago
310
2.0k
mit
92
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Created 2019-04-22
372 commits to main branch, last one about a year ago
864
2.0k
apache-2.0
64
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Created 2017-12-18
3,972 commits to master branch, last one 2 days ago
240
1.6k
bsd-3-clause
56
A Scala kernel for Jupyter
Created 2015-03-10
1,550 commits to main branch, last one 18 hours ago
690
1.1k
apache-2.0
40
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Created 2019-02-10
184 commits to master branch, last one 2 years ago
368
1.0k
apache-2.0
38
Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
Created 2021-12-06
3,167 commits to main branch, last one 21 hours ago
电商用户行为分析大数据平台
Created 2018-06-21
45 commits to master branch, last one 4 years ago
133
551
apache-2.0
30
Qubole Sparklens tool for performance tuning Apache Spark
Created 2018-03-16
54 commits to master branch, last one 2 years ago
The Internals of Spark SQL
Created 2017-12-26
1,540 commits to main branch, last one 5 days ago
87
376
bsd-3-clause
16
New Generation Opensource Data Stack Demo
Created 2022-07-03
57 commits to main branch, last one about a year ago
🐍 Quick reference guide to common patterns & functions in PySpark.
Created 2019-03-07
32 commits to master branch, last one about a year ago
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsigh...
Created 2019-03-14
942 commits to master branch, last one 21 days ago
28
283
apache-2.0
12
Use SQL to build ELT pipelines on a data lakehouse.
Created 2021-03-11
481 commits to main branch, last one 2 years ago
Apache Spark™ and Scala Workshops
Created 2016-03-10
318 commits to gh-pages branch, last one about a year ago
17
199
apache-2.0
7
Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!
Created 2021-09-23
1,037 commits to main branch, last one 3 days ago
Spark Structured Streaming / Kafka / Cassandra / Elastic
Created 2017-06-15
25 commits to master branch, last one 5 years ago
74
175
apache-2.0
16
An encrypted data analytics platform
Created 2016-10-31
675 commits to master branch, last one about a year ago
Apache Spark 3 - Structured Streaming Course Material
Created 2020-07-21
29 commits to master branch, last one 3 years ago
49
110
apache-2.0
35
Spark Connector to read and write with Pulsar
Created 2019-07-01
192 commits to master branch, last one about a month ago
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
Created 2021-06-27
18 commits to hudi branch, last one 2 years ago
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Created 2018-03-26
60 commits to master branch, last one 3 years ago
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We wil...
Created 2022-05-10
45 commits to master branch, last one about a year ago
Apache Spark Course Material
Created 2020-05-05
34 commits to master branch, last one 3 years ago
Tutorials on Big Data essentials: Hadoop, MapReduce, Spark.
Created 2019-08-27
290 commits to master branch, last one 5 days ago
bring sf to spark in production
Created 2019-01-11
143 commits to master branch, last one 2 years ago
42
55
unknown
19
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that re...
Created 2016-05-04
191 commits to master branch, last one 2 years ago
7
54
bsd-3-clause
4
New generation opensource data stack
Created 2022-05-20
8 commits to main branch, last one 2 years ago
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...
Created 2019-11-16
15 commits to master branch, last one about a year ago
尚硅谷大数据Spark-2019版最新 Spark 学习
Created 2019-08-24
39 commits to master branch, last one about a year ago
Data Engineering examples covering Airflow and Mage for workflows; dbt for BigQuery, Redshift, ClickHouse; Spark and Kafka for Batch/Streaming Processing
Created 2023-01-19
120 commits to master branch, last one about a month ago