10 results found Sort:

263
3.7k
apache-2.0
31
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Created 2019-04-08
1,581 commits to master branch, last one 7 hours ago
To extract main article from given URL with Node.js
Created 2015-11-29
755 commits to main branch, last one 24 days ago
120
461
apache-2.0
21
Readability / Html Content / Article Extractor & Web Scrapping library written in PHP
This repository has been archived (exclude archived)
Created 2014-09-24
238 commits to master branch, last one about a year ago
36
161
apache-2.0
11
SmartReader is a library to extract the main content of a web page, based on a port of the Readability library by Mozilla
Created 2017-09-26
407 commits to master branch, last one about a month ago
An article extractor in Rust
Created 2020-04-30
141 commits to master branch, last one 3 years ago
Parse markdown article, download images and replace images URL's with local paths
Created 2019-10-05
154 commits to master branch, last one 6 months ago
17
102
mit
11
Reddit bot to preview and post hyperlinks as comments
Created 2018-12-30
151 commits to master branch, last one 5 years ago
26
93
mit
12
NLP Web Service
Created 2016-12-20
83 commits to master branch, last one 2 years ago
Extract article or news by url or html, parse the title and content, output in markdown format.
Created 2020-09-23
117 commits to master branch, last one 5 months ago
5
51
apache-2.0
1
The best HTML to Markdown library, A esm-native & Useful Utilities with simple, lightweight and epic quality.
Created 2023-11-03
324 commits to main branch, last one about a month ago