aeksco / aws-pdf-textract-pipeline

:mag: Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript

Date Created 2020-02-24 (4 years ago)
Commits 592 (last one 6 months ago)
Stargazers 162 (0 this week)
Watchers 3 (0 this week)
Forks 18
License mit
Ranking

RepositoryStats indexes 565,279 repositories, of these aeksco/aws-pdf-textract-pipeline is ranked #203,056 (64th percentile) for total stargazers, and #413,827 for total watchers. Github reports the primary language for this repository as TypeScript, for repositories using this language it is ranked #14,623/42,955.

aeksco/aws-pdf-textract-pipeline is also tagged with popular topics, for these it's ranked: typescript (#3,918/9613),  aws (#967/2436),  serverless (#593/1255),  pdf (#505/963),  jest (#223/526),  lambda (#165/387),  puppeteer (#151/317),  webscraping (#65/166)

Other Information

aeksco/aws-pdf-textract-pipeline has 5 open pull requests on Github, 278 pull requests have been merged over the lifetime of the repository.

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

542 commits on the default branch (main) since jan '22

Yearly Commits

Commits to the default branch (main) per year

Issue History

Languages

The primary language is TypeScript but there's also others...

updated: 2024-09-18 @ 01:28pm, id: 242643811 / R_kgDODnZzYw