7 results found Sort:
- Filter by Primary Language:
- Python (3)
- HTML (1)
- JavaScript (1)
- Jupyter Notebook (1)
- R (1)
- +
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Created
2019-04-08
1,576 commits to master branch, last one a day ago
文本挖掘和预处理工具(文本清洗、新词发现、情感分析、实体识别链接、关键词抽取、知识抽取、句法分析等),无监督或弱监督方法
Created
2018-11-19
98 commits to master branch, last one 6 months ago
🧹 Python package for text cleaning
Created
2018-12-06
83 commits to main branch, last one 2 years ago
E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M o...
Created
2024-08-04
190 commits to main branch, last one 2 months ago
Tools for cleaning and normalizing text data
Created
2016-01-07
231 commits to master branch, last one 3 years ago
A fork of Dragnet that also extract author, headline, date, keywords from context, as well as built in metadata extraction all in one package
Created
2020-12-04
434 commits to master branch, last one about a year ago
Grammarify is a npm package that safely cleans up text that has mispellings, improper capitalization, lexical illusions, among other things.
Created
2018-04-22
31 commits to master branch, last one 2 years ago