WZBSocialScienceCenter / pdftabextract

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

Date Created 2016-07-08 (8 years ago)
Commits 171 (last one 2 years ago)
Stargazers 2,233 (0 this week)
Watchers 84 (0 this week)
Forks 372
License apache-2.0
Ranking

RepositoryStats indexes 610,149 repositories, of these WZBSocialScienceCenter/pdftabextract is ranked #23,438 (96th percentile) for total stargazers, and #22,421 for total watchers. Github reports the primary language for this repository as Python, for repositories using this language it is ranked #3,627/123,095.

WZBSocialScienceCenter/pdftabextract is also tagged with popular topics, for these it's ranked: python (#1,491/22769),  image-processing (#71/1144),  pdf (#104/1032),  ocr (#65/622),  data-mining (#26/294)

Other Information

WZBSocialScienceCenter/pdftabextract has 1 open pull request on Github, 2 pull requests have been merged over the lifetime of the repository.

Github issues are enabled, there are 4 open issues and 18 closed issues.

Homepage URL: https://datascience.blog.wzb.eu/2017/02/16/data-mining-ocr-pdfs-using-pdftabextract-to-liberate-tabular-data-from-scanned-documents/

Star History

Github stargazers over time

2.5k2.5k2k2k1.5k1.5k1k1k50050000201720172018201820192019202020202021202120222022202320232024202420252025

Watcher History

Github watchers over time, collection started in '23

878786.586.5868685.585.5858584.584.5848420232023Feb '23Feb '23Apr '23Apr '23Jun '23Jun '23Aug '23Aug '23Oct '23Oct '23Dec '23Dec '23Feb '24Feb '24Apr '24Apr '24Jun '24Jun '24Aug '24Aug '24Oct '24Oct '24Dec '24Dec '24

Recent Commit History

2 commits on the default branch (master) since jan '22

22221111110000Jul '22Jul '2220232023Jul '23Jul '2320242024Jul '24Jul '2420252025

Yearly Commits

Commits to the default branch (master) per year

80807070606050504040303020201010002016201620172017201820182019201920202020202120212022202220242024

Issue History

Total Issues
Open Issues
Closed Issues
252520201515101055002018201820192019202020202021202120222022202320232024202420252025

Languages

The primary language is Python but there's also others...

PythonPythonMakefileMakefile

updated: 2025-01-31 @ 03:05pm, id: 62884666 / R_kgDOA7-LOg