10 results found Sort:

258
3.6k
apache-2.0
31
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Created 2019-04-08
1,570 commits to master branch, last one 2 days ago
To extract main article from given URL with Node.js
Created 2015-11-29
753 commits to main branch, last one 11 days ago
121
459
apache-2.0
21
Readability / Html Content / Article Extractor & Web Scrapping library written in PHP
This repository has been archived (exclude archived)
Created 2014-09-24
238 commits to master branch, last one about a year ago
36
160
apache-2.0
11
SmartReader is a library to extract the main content of a web page, based on a port of the Readability library by Mozilla
Created 2017-09-26
407 commits to master branch, last one 28 days ago
An article extractor in Rust
Created 2020-04-30
141 commits to master branch, last one 3 years ago
Parse markdown article, download images and replace images URL's with local paths
Created 2019-10-05
154 commits to master branch, last one 5 months ago
17
102
mit
11
Reddit bot to preview and post hyperlinks as comments
Created 2018-12-30
151 commits to master branch, last one 5 years ago
25
93
mit
12
NLP Web Service
Created 2016-12-20
83 commits to master branch, last one 2 years ago
5
51
apache-2.0
1
The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.
Created 2023-11-03
324 commits to main branch, last one 10 days ago
Extract article or news by url or html, parse the title and content, output in markdown format.
Created 2020-09-23
117 commits to master branch, last one 4 months ago