Search Results - RepositoryStats

ngram2vec zhezhaoa

174

849

unknown

63

Four word embedding models implemented in Python. Supporting arbitrary context features

svd ppmi word glove ngram n-gram analogy chinese word2vec embedding ngram2vec word-embedding

Created 2017-07-16

257 commits to master branch, last one 5 years ago

ngrrram wintermute-cell

16

662

gpl-3.0

3

A TUI tool to help you type faster and learn new layouts. Includes a free cat.

cat cli tui rust ngram dvorak layout typing colemak touchtyping

Created 2024-05-21

19 commits to master branch, last one 4 months ago

ngram-type ranelpadon

41

225

unknown

6

Touch typing trainer using N-grams as data source, with options to customize the auto-generated lessons and specify the minimum typing performance needed. There are sound/color effects as well.

vue keybr ngram dvorak norman qwerty colemak amphetype monkeytype touch-typing lesson-generator

Created 2020-10-25

99 commits to master branch, last one about a year ago

colibri-core proycon

20

126

gpl-3.0

11

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dy...

nlp ngram corpus ngrams python library skipgram c-plus-plus linguistics text-processing pattern-recognition computational-linguistics

Created 2013-09-21

1,449 commits to master branch, last one 3 months ago

refinr ChrisMuir

5

104

unknown

7

Cluster and merge similar string values: an R implementation of Open Refine clustering algorithms

r cran ngram rstats clustering openrefine data-cleaning fuzzy-matching data-clustering approximate-string-matching

Created 2017-03-04

246 commits to master branch, last one about a year ago

n-gram words

17

79

mit

5

Get n-grams from text

ngram bigram dugram n-gram trigram unigram hexagram octogram enneagram heptagram pentagram tetragram

Created 2014-09-18

110 commits to main branch, last one 2 years ago

stringdistance vickumar1981

14

78

other

5

A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard si...

Created 2017-03-02

203 commits to master branch, last one 2 years ago

ngram wrathematics

24

71

other

11

Fast n-Gram Tokenization

r text ngram text-mining

Created 2014-05-06

329 commits to master branch, last one about a year ago

suggest suggest-go

6

67

mit

8

Top-k Approximate String Matching.

ngram autocomplete fuzzy-search spellchecker search-engine golang-library language-model fuzzy-string-matching top-k-approximate-string-matching

Created 2017-02-04

292 commits to master branch, last one 4 years ago

llm_corpus_quality jiangnanboy

6

57

unknown

1

大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning

llm java ngram corpus-cleaning

Created 2023-12-12

23 commits to master branch, last one 8 months ago

gramify 0xVavaldi

3

32

apache-2.0

1

Create n-grams of wordlists based on words, characters, or charsets to use in offline password attacks and data analysis

jtr ngram hashcat mdxfind password password-analysis password-cracking

Created 2022-04-25

37 commits to master branch, last one 9 months ago

Chinese-Tokenization JackHCC

4

32

unknown

1

利用传统方法（N-gram，HMM等）、神经网络方法（CNN，LSTM等）和预训练方法（Bert等）的中文分词任务实现【The word segmentation task is realized by using traditional methods (n-gram, HMM, etc.), neural network methods (CNN, LSTM, etc.) and pre tr...

nlp ngram bert-crf bilstm-crf tokenization hmm-viterbi-algorithm

Created 2022-04-05

4 commits to master branch, last one 2 years ago

Ngram-Spark-Wikipedia AsadiAhmad

0

27

mit

1

Calculating Ngram with PySpark for wikipedia text

nlp ngram spark pyspark big-data wikipedia-dataset

Created 2024-06-03

9 commits to main branch, last one 10 months ago