99 results found Sort:

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Created 2019-02-08
75 commits to master branch, last one 5 months ago
1.3k
4.9k
unknown
180
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Created 2014-02-23
803 commits to master branch, last one 9 months ago
搜索所有中文NLP数据集,附常用英文NLP数据集
Created 2020-02-21
39 commits to master branch, last one about a year ago
540
4.0k
unknown
90
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Created 2019-11-22
527 commits to master branch, last one 5 months ago
994
4.0k
apache-2.0
104
中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Created 2016-12-08
80 commits to master branch, last one 7 months ago
258
3.6k
apache-2.0
31
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Created 2019-04-08
1,570 commits to master branch, last one 2 days ago
Deep Learning and deep reinforcement learning research papers and some codes
Created 2016-11-28
2,833 commits to master branch, last one 8 months ago
Final Weibo Crawler Scrap Anything From Weibo, comments, weibo contents, followers, anything. The Terminator
Created 2017-04-13
36 commits to master branch, last one 5 years ago
410
2.0k
apache-2.0
103
Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
Created 2017-09-01
11 commits to master branch, last one 7 years ago
498
2.0k
unknown
84
用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Created 2017-03-14
9 commits to master branch, last one 7 years ago
1.1k
1.4k
bsd-3-clause
69
A multilingual dialog corpus
Created 2017-01-11
396 commits to master branch, last one 4 years ago
373
1.2k
apache-2.0
47
公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。
Created 2018-10-10
46 commits to master branch, last one 7 months ago
非常全的文言文(古文)-现代文平行语料
Created 2022-01-11
28 commits to main branch, last one 6 months ago
:helicopter: 保险行业语料库,聊天机器人
Created 2017-07-26
8 commits to master branch, last one 3 months ago
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
Created 2020-01-25
25 commits to master branch, last one 2 years ago
208
872
unknown
15
Collections of Chinese NLP corpus
Created 2018-12-28
21 commits to master branch, last one 3 years ago
134
843
gpl-3.0
7
ChatGPT 中文语料库 对话语料 小说语料 客服语料 用于训练大模型
Created 2023-04-26
31 commits to main branch, last one 5 months ago
188
843
gpl-3.0
54
An R package for the Quantitative Analysis of Textual Data
Created 2012-08-15
11,441 commits to master branch, last one about a month ago
Chatbot in 200 lines of code using TensorLayer
Created 2017-09-05
41 commits to master branch, last one 5 years ago
110
808
mit
17
Crawl BookCorpus
Created 2018-07-14
50 commits to master branch, last one about a year ago
128
727
apache-2.0
18
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Created 2021-04-30
98 commits to main branch, last one about a year ago
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Created 2016-07-15
232 commits to master branch, last one about a year ago
80
696
cc-by-4.0
27
Korean corpus repository
Created 2020-08-14
451 commits to master branch, last one 3 years ago
92
695
gpl-3.0
28
An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation
Created 2018-08-23
1,450 commits to main branch, last one a day ago
❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Created 2020-04-22
7 commits to master branch, last one 3 months ago
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Created 2020-03-31
63 commits to master branch, last one 2 years ago
20
393
apache-2.0
7
[NeurlPS D&B 2024] Generative AI for Math: MathPile
Created 2023-11-27
33 commits to main branch, last one 10 days ago
61
363
unknown
8
chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
Created 2020-07-31
5 commits to master branch, last one 2 years ago
96
307
gpl-2.0
21
KH Coder: for Quantitative Content Analysis or Text Mining
Created 2018-05-03
3,363 commits to master branch, last one 27 days ago
106
301
apache-2.0
19
We gather Malaysian dataset! https://malaysian-dataset.readthedocs.io/
Created 2017-10-30
1,131 commits to master branch, last one 14 days ago