iamarunbrahma / pdf-to-markdown

Conversion of PDF documents to structured Markdown, optimized for Retrieval Augmented Generation (RAG) and other NLP tasks. Extract text, tables, and images with preserved formatting for enhanced information retrieval and processing.

Date Created 2024-09-10 (6 months ago)
Commits 26 (last one 4 months ago)
Stargazers 62 (0 this week)
Watchers 3 (0 this week)
Forks 6
License mit
Ranking

RepositoryStats indexes 631,885 repositories, of these iamarunbrahma/pdf-to-markdown is ranked #424,988 (33rd percentile) for total stargazers, and #417,222 for total watchers. Github reports the primary language for this repository as Python, for repositories using this language it is ranked #82,350/128,927.

iamarunbrahma/pdf-to-markdown is also tagged with popular topics, for these it's ranked: python (#17,370/23400),  rag (#475/685),  information-retrieval (#177/230),  retrieval-augmented-generation (#163/229)

Other Information

iamarunbrahma/pdf-to-markdown has 1 open pull request on Github, 0 pull requests have been merged over the lifetime of the repository.

Github issues are enabled, there is 1 open issue and 1 closed issue.

Star History

Github stargazers over time

707060605050404030302020101000Oct '24Oct '24Nov '24Nov '24Dec '24Dec '2420252025Feb '25Feb '25Mar '25Mar '25

Watcher History

Github watchers over time, collection started in '23

4444443.53.533333315 Dec15 DecJan '25Jan '2515 Jan15 JanFeb '25Feb '2515 Feb15 FebMar '25Mar '2515 Mar15 Mar

Recent Commit History

26 commits on the default branch (main) since jan '22

303025252020151510105500Oct '24Oct '24Nov '24Nov '24Dec '24Dec '2420252025Feb '25Feb '25Mar '25Mar '25

Yearly Commits

Commits to the default branch (main) per year

30302525202015151010550020242024

Issue History

Total Issues
Open Issues
Closed Issues
22221111110000Dec '24Dec '2415 Dec15 DecJan '25Jan '2515 Jan15 JanFeb '25Feb '2515 Feb15 FebMar '25Mar '2515 Mar15 Mar

Languages

The only known language in this repository is Python

PythonPython

updated: 2025-03-21 @ 06:56am, id: 855288462 / R_kgDOMvqqjg