68 results found Sort:

4.4k
29.7k
mit
562
💫 Industrial-strength Natural Language Processing (NLP) in Python
Created 2014-07-03
16,169 commits to master branch, last one 13 days ago
164
1.4k
other
29
LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTra...
Created 2021-03-16
3,454 commits to master branch, last one 4 months ago
Easy token price estimates for 400+ LLMs. TokenOps.
Created 2023-12-03
202 commits to main branch, last one 5 days ago
Secure Vault for Customer PII/PHI/PCI/KYC Records
Created 2019-12-08
1,066 commits to master branch, last one 11 days ago
Ravencoin Core integration/staging tree
Created 2017-07-09
16,643 commits to master branch, last one 4 months ago
100
953
mit
26
Unsupervised text tokenizer focused on computational efficiency
This repository has been archived (exclude archived)
Created 2019-06-06
85 commits to master branch, last one about a year ago
👑 spaCy building blocks and visualizers for Streamlit apps
Created 2020-06-23
85 commits to master branch, last one about a year ago
99
728
apache-2.0
22
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Created 2021-01-08
113 commits to master branch, last one 5 months ago
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...
Created 2017-02-07
77 commits to master branch, last one about a year ago
93
550
apache-2.0
31
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
Created 2018-03-13
157 commits to master branch, last one about a year ago
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Created 2023-05-12
196 commits to main branch, last one 10 months ago
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
Created 2012-05-21
243 commits to master branch, last one about a year ago
46
360
unknown
17
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
Created 2012-08-12
4 commits to main branch, last one 7 years ago
14
325
apache-2.0
7
🎤 vibrato: Viterbi-based accelerated tokenizer
Created 2022-07-06
172 commits to main branch, last one 4 days ago
Sudachi in Rust 🦀 and new generation of SudachiPy
Created 2019-11-23
404 commits to develop branch, last one 3 months ago
Fast and customizable text tokenization library with BPE and SentencePiece support
Created 2017-02-14
601 commits to master branch, last one about a year ago
56
258
agpl-3.0
26
CodeChain's official implementation in Rust.
Created 2018-01-23
3,775 commits to master branch, last one 3 years ago
31
248
mit
14
Rule-based token, sentence segmentation for Russian language
Created 2018-11-10
98 commits to master branch, last one about a year ago
TokenScript schema, specs and paper
Created 2019-02-06
563 commits to main branch, last one 3 months ago
OmniTokenizer: one model and one weight for image-video joint tokenization.
Created 2024-06-13
15 commits to main branch, last one 2 months ago
9
227
apache-2.0
3
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
Created 2021-08-18
257 commits to main branch, last one 4 days ago
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Created 2019-09-07
38 commits to master branch, last one about a year ago
4
207
unknown
4
[Paper][Preprint 2024] MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion
Created 2024-04-15
9 commits to main branch, last one 4 months ago
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand...
Created 2021-11-08
19 commits to main branch, last one 2 years ago
17
150
unknown
10
A unified tokenization tool for Images, Chinese and English.
Created 2021-12-22
14 commits to main branch, last one about a year ago
11
148
apache-2.0
5
Simple NLP in Rust with Python bindings
Created 2018-11-05
143 commits to main branch, last one 4 years ago
41
141
apache-2.0
5
Minimal, OpenSSL-less and super lightweight JWT library written in C.
Created 2020-01-15
305 commits to master branch, last one about a month ago
12
140
mit
7
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Created 2021-01-18
224 commits to main branch, last one about a month ago
Fast bare-bones BPE for modern tokenizer training
Created 2023-11-09
51 commits to main branch, last one 29 days ago