77 results found Sort:

4.4k
30.6k
mit
562
💫 Industrial-strength Natural Language Processing (NLP) in Python
Created 2014-07-03
16,218 commits to master branch, last one about a month ago
Easy token price estimates for 400+ LLMs. TokenOps.
Created 2023-12-03
245 commits to main branch, last one about a month ago
168
1.4k
other
29
LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTra...
Created 2021-03-16
3,454 commits to master branch, last one 8 months ago
56
1.4k
apache-2.0
24
A suite of image and video neural tokenizers
Created 2024-10-30
31 commits to main branch, last one 5 days ago
Secure Vault for Customer PII/PHI/PCI/KYC Records
Created 2019-12-08
1,104 commits to master branch, last one 14 days ago
Ravencoin Core integration/staging tree
Created 2017-07-09
16,643 commits to master branch, last one 8 months ago
102
965
mit
26
Unsupervised text tokenizer focused on computational efficiency
This repository has been archived (exclude archived)
Created 2019-06-06
85 commits to master branch, last one about a year ago
👑 spaCy building blocks and visualizers for Streamlit apps
Created 2020-06-23
85 commits to master branch, last one about a year ago
102
743
apache-2.0
24
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Created 2021-01-08
114 commits to master branch, last one 3 months ago
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...
Created 2017-02-07
77 commits to master branch, last one 2 years ago
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Created 2023-05-12
196 commits to main branch, last one about a year ago
93
557
apache-2.0
32
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
Created 2018-03-13
157 commits to master branch, last one about a year ago
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
Created 2012-05-21
245 commits to master branch, last one 15 days ago
46
361
unknown
17
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
Created 2012-08-12
4 commits to main branch, last one 7 years ago
14
342
apache-2.0
7
🎤 vibrato: Viterbi-based accelerated tokenizer
Created 2022-07-06
173 commits to main branch, last one 14 days ago
Sudachi in Rust 🦀 and new generation of SudachiPy
Created 2019-11-23
482 commits to develop branch, last one 2 days ago
Fast and customizable text tokenization library with BPE and SentencePiece support
Created 2017-02-14
601 commits to master branch, last one about a year ago
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
Created 2024-06-13
15 commits to main branch, last one 6 months ago
51
258
agpl-3.0
26
CodeChain's official implementation in Rust.
Created 2018-01-23
3,775 commits to master branch, last one 4 years ago
31
251
mit
15
Rule-based token, sentence segmentation for Russian language
Created 2018-11-10
98 commits to master branch, last one about a year ago
TokenScript schema, specs and paper
Created 2019-02-06
563 commits to main branch, last one 6 months ago
9
231
apache-2.0
3
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
Created 2021-08-18
261 commits to main branch, last one about a month ago
4
231
unknown
4
[Paper][AAAI 2025] (MyGO)Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation
Created 2024-04-15
11 commits to main branch, last one 24 days ago
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Created 2019-09-07
38 commits to master branch, last one about a year ago
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand...
Created 2021-11-08
19 commits to main branch, last one 2 years ago
43
152
apache-2.0
5
Minimal, OpenSSL-less and super lightweight JWT library written in C.
Created 2020-01-15
309 commits to master branch, last one 2 months ago
17
151
unknown
10
A unified tokenization tool for Images, Chinese and English.
Created 2021-12-22
14 commits to main branch, last one about a year ago
11
150
apache-2.0
5
Simple NLP in Rust with Python bindings
Created 2018-11-05
143 commits to main branch, last one 4 years ago
12
149
mit
7
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Created 2021-01-18
233 commits to main branch, last one about a month ago