3 results found Sort:

91
495
apache-2.0
32
High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other...
Created 2014-03-31
148 commits to master branch, last one about a year ago
一个轻量且功能全面的中文分词器,帮助学生了解分词器的工作原理。MicroTokenizer: A lightweight Chinese tokenizer designed for educational and research purposes. Provides a practical, hands-on approach to understanding NLP concepts, fe...
Created 2018-06-12
396 commits to master branch, last one 5 months ago
A NLP package for Chinese text:Preprocessing, Tokenization, Chinese Fonts, Word Embeddings, Text Similarity and Sentiment Analysis 轻量级中文自然语言处理软件包
Created 2023-10-07
198 commits to main branch, last one 4 months ago