Search Results - RepositoryStats

news-please fhamborg

431

2.2k

apache-2.0

53

news-please - an integrated web crawler and information extractor for news that just works

Created 2016-12-18

802 commits to master branch, last one 5 months ago

chatWeb SkywalkerDarren

134

905

mit

19

ChatWeb can crawl web pages, read PDF, DOCX, TXT, and extract the main content, then answer your questions based on the content, or summarize the key points.

ai gpt pdf docx faiss openai chatgpt crawler pgvector embedding newspaper postgresql gpt-35-turbo news-extractor vector-database

Created 2023-03-09

61 commits to master branch, last one 9 months ago

extractnet currentslab

24

273

mit

5

A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package

news python text-mining webscraping web-scraping news-articles text-cleaning news-extractor date-extraction news-extraction machine-learning author-extraction content-extraction

Created 2020-12-04

434 commits to master branch, last one 2 years ago