12 results found Sort:

287
4.1k
apache-2.0
31
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Created 2019-04-08
1,594 commits to master branch, last one 4 days ago
93
712
gpl-3.0
26
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Created 2018-08-23
1,495 commits to main branch, last one 6 days ago
85
358
mit
7
A very simple news crawler with a funny name
Created 2022-10-28
2,691 commits to master branch, last one 9 days ago
22
260
cc-by-4.0
12
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Created 2021-01-19
114 commits to main branch, last one about a year ago
12
154
mit
5
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Created 2021-01-18
233 commits to main branch, last one 4 months ago
27
136
mit
11
Python library for handling audio datasets.
Created 2017-11-27
600 commits to master branch, last one 4 years ago
OpusFilter - Parallel corpus processing toolkit
Created 2019-11-06
270 commits to develop branch, last one about a month ago
Utilities for Processing the Switchboard Dialogue Act Corpus
Created 2018-11-14
48 commits to master branch, last one 4 years ago
22
64
gpl-2.0
10
An advanced, extensible web front-end for the Manatee-open corpus search engine
Created 2015-04-14
13,011 commits to master branch, last one 2 days ago
11
57
gpl-3.0
5
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/
Created 2017-03-08
49 commits to master branch, last one about a year ago