17 results found Sort:

92
695
gpl-3.0
28
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Created 2018-08-23
1,450 commits to main branch, last one a day ago
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Created 2020-03-31
63 commits to master branch, last one 2 years ago
66
450
unknown
45
Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German
Created 2018-06-12
88 commits to master branch, last one 7 days ago
48
278
unknown
15
A list of Indonesian NLP resources.
Created 2018-04-07
30 commits to master branch, last one 2 years ago
A web-based engine for creating and annotating textual corpora
Created 2014-03-27
2,786 commits to master branch, last one about a year ago
Crawler for linguistic corpora
Created 2017-09-08
392 commits to master branch, last one 11 months ago
14
162
apache-2.0
2
:spider: The pipeline for the OSCAR corpus
Created 2021-02-15
419 commits to main branch, last one 12 months ago
19
131
cc-by-4.0
5
Kanji usage frequency data collected from various sources
Created 2016-01-24
177 commits to master branch, last one 24 days ago
43
111
unknown
22
Data for the quantitative study of (Vedic) Sanskrit
Created 2018-08-18
49 commits to master branch, last one 8 days ago
6
86
apache-2.0
9
An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.
This repository has been archived (exclude archived)
Created 2019-03-01
17 commits to master branch, last one 3 years ago
12
62
apache-2.0
4
Quran, Hadith, Translations, Tafaseer, Corpus Linguistics. Everything for NLP
Created 2022-02-20
137 commits to master branch, last one 7 months ago
10
61
mit
7
Large silver standart Russian corpus with NER, morphology and syntax markup
Created 2018-09-10
206 commits to master branch, last one about a year ago
22
60
gpl-2.0
11
An advanced, extensible web front-end for the Manatee-open corpus search engine
Created 2015-04-14
12,748 commits to master branch, last one a day ago
11
56
gpl-3.0
5
SpeCT - Speech Corpus Toolkit for Praat. Documentation: https://lennes.github.io/spect/
Created 2017-03-08
49 commits to master branch, last one about a year ago
A large high-quality corpus of Chinese synonyms 一个大型、高质量的中文同义词语料库。
Created 2021-11-02
5 commits to main branch, last one 2 years ago