94 results found Sort:

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Created 2019-02-08
75 commits to master branch, last one about a month ago
1.3k
4.9k
unknown
180
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Created 2014-02-23
803 commits to master branch, last one 4 months ago
搜索所有中文NLP数据集,附常用英文NLP数据集
Created 2020-02-21
39 commits to master branch, last one about a year ago
541
3.9k
unknown
89
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Created 2019-11-22
527 commits to master branch, last one about a month ago
976
3.9k
apache-2.0
105
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Created 2016-12-08
80 commits to master branch, last one 3 months ago
238
3.2k
apache-2.0
30
Python & command-line tool to gather text on the Web: Crawling & scraping, content extraction, metadata. TXT, Markdown, CSV & XML output.
Created 2019-04-08
1,522 commits to master branch, last one a day ago
Deep Learning and deep reinforcement learning research papers and some codes
Created 2016-11-28
2,833 commits to master branch, last one 3 months ago
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Created 2017-04-13
36 commits to master branch, last one 4 years ago
500
2.0k
unknown
84
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Created 2017-03-14
9 commits to master branch, last one 6 years ago
408
2.0k
apache-2.0
104
Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
Created 2017-09-01
11 commits to master branch, last one 6 years ago
1.2k
1.4k
bsd-3-clause
69
A multilingual dialog corpus
Created 2017-01-11
396 commits to master branch, last one 3 years ago
373
1.2k
apache-2.0
47
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
Created 2018-10-10
46 commits to master branch, last one 3 months ago
:helicopter: 保险行业语料库,聊天机器人
Created 2017-07-26
7 commits to master branch, last one about a month ago
非常全的文言文(古文)-现代文平行语料
Created 2022-01-11
28 commits to main branch, last one 2 months ago
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Created 2020-01-25
25 commits to master branch, last one about a year ago
209
859
unknown
15
Collections of Chinese NLP corpus
Created 2018-12-28
21 commits to master branch, last one 3 years ago
Chatbot in 200 lines of code using TensorLayer
Created 2017-09-05
41 commits to master branch, last one 4 years ago
186
833
gpl-3.0
53
An R package for the Quantitative Analysis of Textual Data
Created 2012-08-15
11,330 commits to master branch, last one about a month ago
132
799
gpl-3.0
7
ChatGPT 中文语料库 对话语料 小说语料 客服语料 用于训练大模型
Created 2023-04-26
31 commits to main branch, last one about a month ago
111
792
mit
17
Crawl BookCorpus
Created 2018-07-14
50 commits to master branch, last one 11 months ago
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Created 2016-07-15
232 commits to master branch, last one 9 months ago
122
691
apache-2.0
18
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Created 2021-04-30
98 commits to main branch, last one about a year ago
87
673
gpl-3.0
28
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Created 2018-08-23
1,430 commits to main branch, last one a day ago
80
666
cc-by-4.0
26
Korean corpus repository
Created 2020-08-14
451 commits to master branch, last one 3 years ago
❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Created 2020-04-22
6 commits to master branch, last one about a month ago
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Created 2020-03-31
63 commits to master branch, last one 2 years ago
19
353
apache-2.0
7
Generative AI for Math: MathPile
Created 2023-11-27
30 commits to main branch, last one 6 days ago
60
337
unknown
8
chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
Created 2020-07-31
5 commits to master branch, last one about a year ago
96
308
gpl-2.0
21
KH Coder: for Quantitative Content Analysis or Text Mining
Created 2018-05-03
3,359 commits to master branch, last one 17 days ago
104
290
apache-2.0
19
We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
Created 2017-10-30
1,109 commits to master branch, last one 17 days ago