11 results found Sort:

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Created 2015-09-01
144 commits to master branch, last one 4 months ago
103
960
mit
26
Unsupervised text tokenizer focused on computational efficiency
This repository has been archived (exclude archived)
Created 2019-06-06
85 commits to master branch, last one about a year ago
The fastest JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT-2 / GPT-3 / GPT-4 / GPT-4o / GPT-o1. Port of OpenAI's tiktoken with additional features.
Created 2023-03-22
115 commits to main branch, last one 12 days ago
Fast and customizable text tokenization library with BPE and SentencePiece support
Created 2017-02-14
601 commits to master branch, last one about a year ago
Ready-made tokenizer library for working with GPT and tiktoken
Created 2023-02-02
156 commits to main branch, last one about a month ago
Byte Pair Encoding for Python!
Created 2017-09-22
89 commits to master branch, last one 2 years ago
59
148
mit
26
nfelib - bindings Python para e ler e gerir XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e
Created 2017-09-18
242 commits to master branch, last one 5 months ago
Fast bare-bones BPE for modern tokenizer training
Created 2023-11-09
52 commits to main branch, last one 2 months ago
Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3
Created 2022-12-21
14 commits to main branch, last one 19 days ago
Simple-to-use scoring function for arbitrarily tokenized texts.
Created 2023-04-29
26 commits to main branch, last one 29 days ago
GPT3 encoder & decoder tool written in Swift
Created 2023-02-08
13 commits to master branch, last one about a year ago