2 results found Sort:
Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to...
Created
2013-03-26
1,579 commits to master branch, last one about a month ago
Taiwanese Hokkien Transliterator and Tokeniser
Created
2023-06-14
269 commits to main branch, last one 2 months ago