aeksco / aws-pdf-textract-pipeline

:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

Date Created 2020-02-24 (5 years ago)
Commits 592 (last one 11 months ago)
Stargazers 165 (0 this week)
Watchers 3 (0 this week)
Forks 18
License mit
Ranking

RepositoryStats indexes 622,366 repositories, of these aeksco/aws-pdf-textract-pipeline is ranked #213,674 (66th percentile) for total stargazers, and #435,531 for total watchers. Github reports the primary language for this repository as TypeScript, for repositories using this language it is ranked #16,045/48,777.

aeksco/aws-pdf-textract-pipeline is also tagged with popular topics, for these it's ranked: typescript (#4,162/10472),  aws (#992/2562),  serverless (#603/1305),  pdf (#541/1054),  jest (#225/541),  lambda (#163/402),  puppeteer (#153/337),  webscraping (#82/200)

Other Information

aeksco/aws-pdf-textract-pipeline has 5 open pull requests on Github, 278 pull requests have been merged over the lifetime of the repository.

Star History

Github stargazers over time

1801801601601401401201201001008080606040402020002021202120222022202320232024202420252025

Watcher History

Github watchers over time, collection started in '23

3333332.52.522222220232023Feb '23Feb '23Apr '23Apr '23Jun '23Jun '23Aug '23Aug '23Oct '23Oct '23Dec '23Dec '23Feb '24Feb '24Apr '24Apr '24Jun '24Jun '24Aug '24Aug '24Oct '24Oct '24Dec '24Dec '24Feb '25Feb '25

Recent Commit History

542 commits on the default branch (main) since jan '22

60060050050040040030030020020010010000Jul '22Jul '2220232023Jul '23Jul '2320242024Jul '24Jul '2420252025

Yearly Commits

Commits to the default branch (main) per year

35035030030025025020020015015010010050500020202020202120212022202220242024

Issue History

Total Issues
Open Issues
Closed Issues
8877665544332211002021202120222022202320232024202420252025

Languages

The primary language is TypeScript but there's also others...

TypeScriptTypeScriptJavaScriptJavaScript

updated: 2025-02-09 @ 09:26am, id: 242643811 / R_kgDODnZzYw