StabRise / spark-pdf

PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it

Date Created 2024-11-23 (5 months ago)
Commits 84 (last one 21 days ago)
Stargazers 66 (0 this week)
Watchers 2 (0 this week)
Forks 3
License agpl-3.0
Ranking

RepositoryStats indexes 652,531 repositories, of these StabRise/spark-pdf is ranked #416,203 (36th percentile) for total stargazers, and #491,702 for total watchers. Github reports the primary language for this repository as Scala, for repositories using this language it is ranked #1,501/2,052.

StabRise/spark-pdf is also tagged with popular topics, for these it's ranked: data-science (#1,697/2265),  pdf (#862/1111),  ocr (#489/669),  spark (#427/560),  big-data (#308/376),  data-engineering (#230/349)

Other Information

StabRise/spark-pdf has Github issues enabled, there are 3 open issues and 8 closed issues.

There have been 5 releases, the latest one was published on 2025-04-27 (21 days ago)

Homepage URL: https://stabrise.com/spark-pdf/

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

84 commits on the default branch (main) since jan '22

Yearly Commits

Commits to the default branch (main) per year

Issue History

Languages

The only known language in this repository is Scala

Opengraph Image
StabRise/spark-pdf

updated: 2025-05-14 @ 10:49pm, id: 892965977 / R_kgDONTmUWQ