11 results found Sort:
- Filter by Primary Language:
- Python (2)
- C++ (2)
- JavaScript (2)
- Rust (1)
- C (1)
- Scala (1)
- Go (1)
- Java (1)
- +
Four word embedding models implemented in Python. Supporting arbitrary context features
Created
2017-07-16
257 commits to master branch, last one 5 years ago
A TUI tool to help you type faster and learn new layouts. Includes a free cat.
Created
2024-05-21
19 commits to master branch, last one 29 days ago
Touch typing trainer using N-grams as data source, with options to customize the auto-generated lessons and specify the minimum typing performance needed. There are sound/color effects as well.
Created
2020-10-25
99 commits to master branch, last one about a year ago
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dy...
Created
2013-09-21
1,449 commits to master branch, last one 4 days ago
Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms
Created
2017-03-04
246 commits to master branch, last one 9 months ago
Get n-grams from text
Created
2014-09-18
110 commits to main branch, last one 2 years ago
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard si...
jaro
ngram
jaccard
soundex
levenshtein
jaro-winkler
hacktoberfest
jaro-distance
fuzzy-matching
dice-coefficient
hamming-distance
cosine-similarity
soundex-algorithm
string-similarity
jaccard-similarity
levenshtein-distance
jaro-winkler-distance
sorensen-dice-distance
cosine-similarity-scores
longest-common-subsequence
Created
2017-03-02
203 commits to master branch, last one 2 years ago
Fast n-Gram Tokenization
Created
2014-05-06
329 commits to master branch, last one about a year ago
Top-k Approximate String Matching.
Created
2017-02-04
292 commits to master branch, last one 3 years ago
大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning
Created
2023-12-12
23 commits to master branch, last one 4 months ago
利用传统方法(N-gram,HMM等)、神经网络方法(CNN,LSTM等)和预训练方法(Bert等)的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre tr...
Created
2022-04-05
4 commits to master branch, last one 2 years ago