Search Results - RepositoryStats

4.4k

27.2k

bsd-2-clause

574

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

bi mysql spark athena python redash bigquery redshift analytics dashboard spark-sql databricks javascript postgresql hacktoberfest visualization business-intelligence

Created 2013-10-28

7,863 commits to master branch, last one a day ago

kyuubi apache

936

2.2k

apache-2.0

63

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

sql hive jdbc spark hadoop thrift data-lake spark-sql kubernetes hacktoberfest

Created 2017-12-18

4,267 commits to master branch, last one a day ago

spark dotnet

322

2.0k

mit

84

.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.

Created 2019-04-22

383 commits to main branch, last one 6 days ago

almond almond-sh

248

1.6k

bsd-3-clause

56

A Scala kernel for Jupyter

repl scala spark jupyter spark-sql jupyter-kernels jupyter-notebook

Created 2015-03-10

1,631 commits to main branch, last one 29 days ago

incubator-gluten apache

474

1.3k

apache-2.0

40

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

simd arrow velox spark-sql clickhouse vectorization

Created 2021-12-06

4,763 commits to main branch, last one 17 hours ago

LearningSparkV2 databricks

762

1.3k

apache-2.0

40

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]

mllib spark mlflow spark-sql delta-lake spark-mllib apache-spark structured-streaming

Created 2019-02-10

185 commits to master branch, last one 2 months ago

UserActionAnalyzePlatform oeljeklaus-you

387

1.0k

apache-2.0

58

电商用户行为分析大数据平台

java kyro spark hadoop spark-sql sparkjava accumulator

Created 2018-06-21

45 commits to master branch, last one 5 years ago

sparklens qubole

140

573

apache-2.0

28

Qubole Sparklens tool for performance tuning Apache Spark

scala spark cluster spark-ml scheduler spark-job spark-sql sparkjava scheduling simulation performance spark-mllib performance-tuning spark-applications performance-metrics performance-analysis performance-visualization

Created 2018-03-16

54 commits to master branch, last one 3 years ago

pyspark-cheatsheet kevinschaich

167

512

mit

7

🐍 Quick reference guide to common patterns & functions in PySpark.

data docs cheat guide spark guides pyspark reference spark-sql cheatsheet quickstart references cheatsheets data-science documentation pyspark-tutorial

Created 2019-03-07

32 commits to master branch, last one 2 years ago

spark-sql-internals japila-books

132

463

apache-2.0

15

The Internals of Spark SQL

book spark internals spark-sql apache-spark mkdocs-material

Created 2017-12-26

1,554 commits to main branch, last one 2 months ago

ngods-stocks zsvoboda

100

428

bsd-3-clause

16

New Generation Opensource Data Stack Demo

dbt cube spark trino python dagster datahub iceberg trinodb metabase spark-sql

Created 2022-07-03

57 commits to main branch, last one 2 years ago

data-accelerator microsoft

91

301

mit

29

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsigh...

Created 2019-03-14

961 commits to master branch, last one 2 months ago

cuelake cuebook

28

285

apache-2.0

11

Use SQL to build ELT pipelines on a data lakehouse.

elt etl sql delta upsert datalake data-lake lakehouse pipelines spark-sql apache-spark data-pipeline data-transfer apache-iceberg data-ingestion data-engineering data-integration zeppelin-notebook incremental-updates

Created 2021-03-11

481 commits to main branch, last one 2 years ago

spark-workshop jaceklaskowski

148

264

apache-2.0

29

Apache Spark™ and Scala Workshops

spark workshop spark-sql spark-mllib apache-spark spark-workshops spark-structured-streaming

Created 2016-03-10

318 commits to gh-pages branch, last one 2 years ago

qbeast-spark Qbeast-io

21

225

apache-2.0

10

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

scala spark big-data sampling spark-sql datasource data-lakehouse

Created 2021-09-23

1,106 commits to main branch, last one 2 months ago

Spark-Structured-Streaming-Examples polomarcus

78

183

apache-2.0

10

Spark Structured Streaming / Kafka / Cassandra / Elastic

kafka spark cassandra spark-sql structured-streaming

Created 2017-06-15

25 commits to master branch, last one 6 years ago

opaque-sql mc2-project

73

182

apache-2.0

17

An encrypted data analytics platform

spark enclave privacy security analytics spark-sql machine-learning

Created 2016-10-31

675 commits to master branch, last one 2 years ago

Spark-Streaming-In-Python LearningJournal

159

121

mit

7

Apache Spark 3 - Structured Streaming Course Material

python bigdata pyspark big-data data-lake spark-sql apache-spark spark-streaming

Created 2020-07-21

29 commits to master branch, last one 4 years ago

Real-time-Data-Warehouse izhangzhihao

43

113

unknown

3

Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi

Created 2021-06-27

18 commits to hudi branch, last one 3 years ago

pulsar-spark streamnative

50

113

apache-2.0

32

Spark Connector to read and write with Pulsar

flink spark spark-sql apache-spark data-science apache-pulsar data-processing batch-processing stream-processing structured-streaming

Created 2019-07-01

192 commits to master branch, last one 11 months ago

spark-connect-rs sjrusso8

17

106

apache-2.0

4

Apache Spark Connect Client for Rust

spark spark-sql grpc-client spark-connect

Created 2023-09-18

86 commits to main branch, last one about a month ago

ApacheSpark martandsingh

63

97

unknown

11

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We wil...

etl sql hive spark hadoop pyspark database datalake deltalake spark-sql databricks timetravel apachespark etl-pipeline data-analysis spark-streaming data-engineering

Created 2022-05-10

45 commits to master branch, last one 2 years ago

Movies-Analytics-in-Spark-and-Scala Thomas-George-T

53

94

apache-2.0

5

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

rdd scala spark hadoop movies big-data analytics spark-rdd spark-sql case-study dataframes spark-scala shell-script spark-programs spark-dataframes big-data-projects movielens-dataset big-data-analytics movielens-data-analysis

Created 2018-03-26

60 commits to master branch, last one 3 years ago

SparkProgrammingInScala LearningJournal

159

88

mit

9

Apache Spark Course Material

scala spark bigdata big-data datalake data-lake spark-sql spark-scala apache-spark

Created 2020-05-05

34 commits to master branch, last one 4 years ago

big_data groda

26

73

mit

2

Tutorials on Big Data essentials: Hadoop, MapReduce, Spark. Explore a variety of tutorials and demonstrations on Big Data technologies, primarily in the form of Jupyter notebooks. Most notebooks are s...

mrjob spark bigtop docker hadoop bigdata pyspark big-data mapreduce spark-sql testdfsio hadoop-hdfs apache-spark apache-sedona hadoop-cluster mapreduce-bash gutenberg-ebooks hadoop-mapreduce jupyter-notebook

Created 2019-08-27

370 commits to master branch, last one 3 months ago

ngods zsvoboda

9

65

bsd-3-clause

4

New generation opensource data stack

sql data jdbc scala spark trino presto python iceberg trinodb prestodb sparksql analytics prestosql spark-sql data-pipeline

Created 2022-05-20

8 commits to main branch, last one 2 years ago

geospark harryprince

17

57

unknown

8

bring sf to spark in production

r gis spark-sql apache-spark spatial-queries spatial-analysis sparklyr-extension large-scale-spatial-analysis

Created 2019-01-11

143 commits to master branch, last one 3 years ago

Spark spider-123-eng

42

55

unknown

19

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that re...

Created 2016-05-04

191 commits to master branch, last one 3 years ago

datapipelines-essentials-python vim89

38

53

apache-2.0

4

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformati...

etl xml spark hadoop python pyspark python3 big-data datalake spark-sql hadoop-hdfs xml-parsing apache-spark etl-pipeline data-pipeline etl-framework etl-components hadoop-mapreduce

Created 2019-11-16

15 commits to master branch, last one about a year ago

spark_learning sjyttkl

54

51

unknown

3

尚硅谷大数据Spark-2019版最新 Spark 学习

spark spark-sql spark-core

Created 2019-08-24

39 commits to master branch, last one 2 years ago