3 results found Sort:

423
2.0k
apache-2.0
54
news-please - an integrated web crawler and information extractor for news that just works
Created 2016-12-18
794 commits to master branch, last one 23 days ago
ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points.
Created 2023-03-09
61 commits to master branch, last one 3 months ago
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
Created 2020-12-04
434 commits to master branch, last one about a year ago