11 results found Sort:
- Filter by Primary Language:
- Python (5)
- C++ (2)
- Go (1)
- Rust (1)
- Swift (1)
- TypeScript (1)
- +
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Created
2015-09-01
144 commits to master branch, last one 4 months ago
Unsupervised text tokenizer focused on computational efficiency
This repository has been archived
(exclude archived)
Created
2019-06-06
85 commits to master branch, last one about a year ago
The fastest JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT-2 / GPT-3 / GPT-4 / GPT-4o / GPT-o1. Port of OpenAI's tiktoken with additional features.
Created
2023-03-22
115 commits to main branch, last one 12 days ago
Fast and customizable text tokenization library with BPE and SentencePiece support
Created
2017-02-14
601 commits to master branch, last one about a year ago
Ready-made tokenizer library for working with GPT and tiktoken
Created
2023-02-02
156 commits to main branch, last one about a month ago
Byte Pair Encoding for Python!
Created
2017-09-22
89 commits to master branch, last one 2 years ago
nfelib - bindings Python para e ler e gerir XML de NF-e, NFS-e nacional, CT-e, MDF-e, BP-e
Created
2017-09-18
242 commits to master branch, last one 5 months ago
Fast bare-bones BPE for modern tokenizer training
Created
2023-11-09
52 commits to main branch, last one 2 months ago
Go BPE tokenizer (Encoder+Decoder) for GPT2 and GPT3
Created
2022-12-21
14 commits to main branch, last one 19 days ago
Simple-to-use scoring function for arbitrarily tokenized texts.
Created
2023-04-29
26 commits to main branch, last one 29 days ago
GPT3 encoder & decoder tool written in Swift
Created
2023-02-08
13 commits to master branch, last one about a year ago