2 results found Sort:

13
65
gpl-3.0
13
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to...
Created 2013-03-26
1,579 commits to master branch, last one about a month ago
Taiwanese Hokkien Transliterator and Tokeniser
Created 2023-06-14
269 commits to main branch, last one 2 months ago