3 results found Sort:

Normalize a URL
Created 2015-01-11
166 commits to main branch, last one 9 months ago
Extract and decompose URLs (including emails, which are conceptually a part of URLs) with robust patterns.
Created 2019-01-22
68 commits to master branch, last one 4 months ago
9
127
apache-2.0
3
Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters
Created 2015-07-07
317 commits to master branch, last one 19 days ago