80 results found Sort:

4.5k
31.1k
mit
562
💫 Industrial-strength Natural Language Processing (NLP) in Python
Created 2014-07-03
16,225 commits to master branch, last one about a month ago
Easy token price estimates for 400+ LLMs. TokenOps.
Created 2023-12-03
257 commits to main branch, last one 11 days ago
73
1.6k
apache-2.0
24
A suite of image and video neural tokenizers
This repository has been archived (exclude archived)
Created 2024-10-30
51 commits to main branch, last one about a month ago
168
1.4k
other
29
LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTra...
Created 2021-03-16
3,454 commits to master branch, last one 10 months ago
Ravencoin Core integration/staging tree
Created 2017-07-09
16,643 commits to master branch, last one 10 months ago
103
965
mit
25
Unsupervised text tokenizer focused on computational efficiency
This repository has been archived (exclude archived)
Created 2019-06-06
85 commits to master branch, last one about a year ago
👑 spaCy building blocks and visualizers for Streamlit apps
Created 2020-06-23
85 commits to master branch, last one about a year ago
103
747
apache-2.0
24
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Created 2021-01-08
114 commits to master branch, last one 5 months ago
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...
Created 2017-02-07
77 commits to master branch, last one 2 years ago
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Created 2023-05-12
196 commits to main branch, last one about a year ago
94
557
apache-2.0
30
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
Created 2018-03-13
157 commits to master branch, last one about a year ago
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
Created 2012-05-21
245 commits to master branch, last one 2 months ago
45
363
unknown
15
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
Created 2012-08-12
4 commits to main branch, last one 7 years ago
15
352
apache-2.0
7
🎤 vibrato: Viterbi-based accelerated tokenizer
Created 2022-07-06
174 commits to main branch, last one 22 days ago
Sudachi in Rust 🦀 and new generation of SudachiPy
Created 2019-11-23
482 commits to develop branch, last one 2 months ago
Fast and customizable text tokenization library with BPE and SentencePiece support
Created 2017-02-14
601 commits to master branch, last one about a year ago
38
287
unknown
11
The official code 👩‍💻 for - TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis
Created 2024-02-27
4 commits to master branch, last one 23 days ago
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
Created 2024-06-13
15 commits to main branch, last one 8 months ago
32
260
mit
14
Rule-based token, sentence segmentation for Russian language
Created 2018-11-10
98 commits to master branch, last one about a year ago
51
257
agpl-3.0
25
CodeChain's official implementation in Rust.
Created 2018-01-23
3,775 commits to master branch, last one 4 years ago
TokenScript schema, specs and paper
Created 2019-02-06
563 commits to main branch, last one 8 months ago
4
240
unknown
4
[Paper][AAAI 2025] (MyGO)Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation
Created 2024-04-15
11 commits to main branch, last one 2 months ago
10
235
apache-2.0
2
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
Created 2021-08-18
263 commits to main branch, last one 22 days ago
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Created 2019-09-07
38 commits to master branch, last one 2 years ago
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand...
Created 2021-11-08
19 commits to main branch, last one 2 years ago
12
154
mit
5
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Created 2021-01-18
233 commits to main branch, last one 3 months ago
43
153
apache-2.0
4
Minimal, OpenSSL-less and super lightweight JWT library written in C.
Created 2020-01-15
309 commits to master branch, last one 4 months ago
17
151
unknown
10
A unified tokenization tool for Images, Chinese and English.
Created 2021-12-22
14 commits to main branch, last one about a year ago