11 results found Sort:

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Created 2015-09-01
144 commits to master branch, last one about a month ago
101
953
mit
26
Unsupervised text tokenizer focused on computational efficiency
This repository has been archived (exclude archived)
Created 2019-06-06
85 commits to master branch, last one about a year ago
The fastest JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT-2 / GPT-3 / GPT-4 / GPT-4o. Port of OpenAI's tiktoken with additional features.
Created 2023-03-22
93 commits to main branch, last one 7 days ago
Fast and customizable text tokenization library with BPE and SentencePiece support
Created 2017-02-14
601 commits to master branch, last one about a year ago
Ready-made tokenizer library for working with GPT and tiktoken
Created 2023-02-02
140 commits to main branch, last one 4 months ago
Byte Pair Encoding for Python!
Created 2017-09-22
89 commits to master branch, last one 2 years ago
58
144
mit
26
nfelib - bindings Python para e ler e gerir XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e
Created 2017-09-18
242 commits to master branch, last one 2 months ago
Fast bare-bones BPE for modern tokenizer training
Created 2023-11-09
51 commits to main branch, last one about a month ago
Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3
Created 2022-12-21
13 commits to main branch, last one 11 months ago
GPT3 encoder & decoder tool written in Swift
Created 2023-02-08
13 commits to master branch, last one about a year ago
Simple-to-use scoring function for arbitrarily tokenized texts.
Created 2023-04-29
21 commits to main branch, last one 17 days ago