16 results found Sort:

2.3k
11.9k
apache-2.0
285
100+ Chinese Word Vectors 上百种预训练中文词向量
Created 2018-01-09
134 commits to master branch, last one about a year ago
pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation
Created 2018-08-05
200 commits to master branch, last one 2 years ago
595
3.9k
apache-2.0
105
百度NLP:分词,词性标注,命名实体识别,词重要性
Created 2018-07-02
155 commits to master branch, last one 3 years ago
614
3.3k
mit
87
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类
Created 2018-12-30
107 commits to master branch, last one 2 years ago
298
3.2k
mit
71
SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Created 2014-03-25
488 commits to master branch, last one about a month ago
809
3.1k
apache-2.0
85
中文分词
Created 2018-03-19
218 commits to master branch, last one 8 months ago
271
1.8k
unknown
60
Datasets, SOTA results of every fields of Chinese NLP
Created 2019-05-16
278 commits to master branch, last one 3 years ago
212
914
apache-2.0
92
Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction imp...
Created 2014-03-31
680 commits to master branch, last one about a year ago
Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm
Created 2018-08-13
294 commits to master branch, last one 19 hours ago
The Jieba Chinese Word Segmentation Implemented in Rust
Created 2018-05-06
314 commits to main branch, last one 6 months ago
92
485
apache-2.0
33
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other...
Created 2014-03-31
148 commits to master branch, last one about a year ago
26
245
other
23
MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型
Created 2019-07-23
54 commits to master branch, last one 2 years ago
42
204
unknown
1
A PyTorch implementation of a BiLSTM \ BERT \ Roberta (+ BiLSTM + CRF) model for Chinese Word Segmentation (中文分词) .
Created 2021-03-25
41 commits to main branch, last one 2 years ago
一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, fe...
Created 2018-06-12
396 commits to master branch, last one 2 months ago
33
85
apache-2.0
11
Chinese Word Segmentation Tool, THULAC的Java实现.
Created 2017-03-03
33 commits to master branch, last one 4 years ago
7
46
gpl-3.0
1
A convenient Chinese word segmentation tool 简便中文分词器
Created 2021-11-10
161 commits to master branch, last one 4 months ago