Search Results - RepositoryStats

Chinese-Word-Vectors Embedding

2.3k

11.9k

apache-2.0

284

100+ Chinese Word Vectors 上百种预训练中文词向量

chinese embedding embeddings vectors-trained word-embeddings chinese-word-segmentation

Created 2018-01-09

134 commits to master branch, last one about a year ago

pkuseg-python lancopku

989

6.6k

mit

208

pkuseg多领域中文分词工具; The pkuseg toolkit for multi-domain Chinese word segmentation

chinese-word-segmentation

Created 2018-08-05

200 commits to master branch, last one 2 years ago

lac baidu

595

3.9k

apache-2.0

105

百度NLP：分词，词性标注，命名实体识别，词重要性

java python chinese-nlp lexical-analysis word-segmentation part-of-speech-tagger named-entity-recognition chinese-word-segmentation

Created 2018-07-02

155 commits to master branch, last one 3 years ago

Jiagu ownthink

614

3.4k

mit

87

Jiagu深度学习自然语言处理工具知识图谱关系抽取中文分词词性标注命名实体识别情感分析新词发现关键词文本摘要文本聚类

cws ner nlp pos chinese-word-segmentation

Created 2018-12-30

107 commits to master branch, last one 2 years ago

SymSpell wolfgarbe

300

3.2k

mit

71

SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

spelling symspell spellcheck levenshtein spell-check fuzzy-search edit-distance fuzzy-matching text-segmentation word-segmentation damerau-levenshtein spelling-correction levenshtein-distance chinese-text-segmentation chinese-word-segmentation approximate-string-matching

Created 2014-03-25

490 commits to master branch, last one 6 days ago

pyhanlp hankcs

810

3.2k

apache-2.0

85

中文分词

hanlp dependency-parser part-of-speech-tagger named-entity-recognition chinese-word-segmentation natural-language-processing

Created 2018-03-19

242 commits to master branch, last one 12 days ago

ChineseNLP didi

271

1.8k

unknown

60

Datasets, SOTA results of every fields of Chinese NLP

nlp nlp-tasks chinese-nlp entity-linking question-answering machine-translation chinese-word-segmentation

Created 2019-05-16

278 commits to master branch, last one 3 years ago

jcseg lionsoul2014

212

916

apache-2.0

91

Jcseg is a light weight NLP framework developed with Java. Provide CJK and English segmentation based on MMSEG algorithm, With also keywords extraction, key sentence extraction, summary extraction imp...

nlp java jcseg mmseg chinese-nlp pos-tagging solr-plugin jcseg-analyzer lucene-analyzer lucene-tokenizer keywords-extraction opensearch-analyzer opensearch-tokenizer elasticsearch-analyzer elasticsearch-tokenizer nlp-keywords-extraction chinese-text-segmentation chinese-word-segmentation natural-language-processing

Created 2014-03-31

680 commits to master branch, last one about a year ago

symspellpy mammothb

121

810

mit

16

Python port of SymSpell: 1 million times faster spelling correction & fuzzy search through Symmetric Delete spelling correction algorithm

python spelling symspell spellcheck levenshtein spell-check fuzzy-search edit-distance fuzzy-matching text-segmentation word-segmentation damerau-levenshtein spelling-correction levenshtein-distance chinese-text-segmentation chinese-word-segmentation approximate-string-matching

Created 2018-08-13

294 commits to master branch, last one about a month ago

jieba-rs messense

48

778

mit

13

The Jieba Chinese Word Segmentation Implemented in Rust

nlp wasm jieba jieba-chinese chinese-word-segmentation

Created 2018-05-06

325 commits to main branch, last one 7 days ago

friso lionsoul2014

92

494

apache-2.0

33

High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other...

c tokenizer cjk-tokenizer php-tokenizer full-text-search korean-tokenizer chinese-tokenizer japanese-tokenizer chinese-word-segmentation

Created 2014-03-31

148 commits to master branch, last one about a year ago

monpa monpa-team

26

245

other

23

MONPA 罔拍是一個提供正體中文斷詞、詞性標註以及命名實體辨識的多任務模型

ner nlp pos bert albert pos-tagging word-segmentation named-entity-recognition chinese-word-segmentation

Created 2019-07-23

54 commits to master branch, last one 2 years ago

WordSeg hemingkx

42

206

unknown

1

A PyTorch implementation of a BiLSTM \ BERT \ Roberta (+ BiLSTM + CRF) model for Chinese Word Segmentation (中文分词) .

bert pytorch roberta bert-crf bilstm-crf chinese-word-segmentation

Created 2021-03-25

41 commits to main branch, last one 2 years ago

MicroTokenizer howl-anderson

22

146

mit

9

一个轻量且功能全面的中文分词器，帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, fe...

tokenizer chinese-nlp dag-network chinese-tokenizer educational-project nlp-machine-learning chinese-word-segmentation

Created 2018-06-12

396 commits to master branch, last one 3 months ago

thulac4j yizhiru

31

84

apache-2.0

11

Chinese Word Segmentation Tool, THULAC的Java实现.

thulac chinese-word-segmentation

Created 2017-03-03

33 commits to master branch, last one 4 years ago

jiojio dongrixinyu

7

46

gpl-3.0

1

A convenient Chinese word segmentation tool 简便中文分词器

crf python chinese-nlp wordsegmentation partofspeech-tagger chinese-word-segmentation

Created 2021-11-10

165 commits to master branch, last one 19 days ago