vim89 / datapipelines-essentials-python

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Date Created 2019-11-16 (5 years ago)
Commits 15 (last one about a year ago)
Stargazers 53 (0 this week)
Watchers 6 (0 this week)
Forks 37
License apache-2.0
Ranking

RepositoryStats indexes 595,856 repositories, of these vim89/datapipelines-essentials-python is ranked #451,650 (24th percentile) for total stargazers, and #300,666 for total watchers. Github reports the primary language for this repository as Python, for repositories using this language it is ranked #86,528/119,431.

vim89/datapipelines-essentials-python is also tagged with popular topics, for these it's ranked: python (#18,175/22324),  python3 (#3,220/4145),  xml (#489/566),  spark (#465/540),  big-data (#319/363),  etl (#228/263),  hadoop (#156/182),  apache-spark (#91/111),  pyspark (#82/110)

Other Information

vim89/datapipelines-essentials-python has Github issues enabled, there is 1 open issue and 0 closed issues.

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

1 commits on the default branch (master) since jan '22

Yearly Commits

Commits to the default branch (master) per year

Issue History

Languages

The primary language is Python but there's also others...

updated: 2024-08-11 @ 07:15pm, id: 222109927 / R_kgDODT0g5w