2 results found Sort:

13
66
gpl-3.0
13
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to...
Created 2013-03-26
1,584 commits to master branch, last one 2 days ago
Taiwanese Hokkien Transliterator and Tokeniser
Created 2023-06-14
269 commits to main branch, last one 3 months ago