69 results found Sort:
- Filter by Primary Language:
- Python (24)
- Jupyter Notebook (7)
- Rust (6)
- C++ (5)
- Go (4)
- C (4)
- JavaScript (4)
- TypeScript (3)
- Java (3)
- Solidity (2)
- C# (2)
- PHP (1)
- Julia (1)
- HTML (1)
- TeX (1)
- +
💫 Industrial-strength Natural Language Processing (NLP) in Python
Created
2014-07-03
16,212 commits to master branch, last one 14 days ago
Easy token price estimates for 400+ LLMs. TokenOps.
Created
2023-12-03
222 commits to main branch, last one 7 days ago
LunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTra...
Created
2021-03-16
3,454 commits to master branch, last one 6 months ago
Secure Vault for Customer PII/PHI/PCI/KYC Records
Created
2019-12-08
1,078 commits to master branch, last one 2 days ago
Ravencoin Core integration/staging tree
Created
2017-07-09
16,643 commits to master branch, last one 6 months ago
Unsupervised text tokenizer focused on computational efficiency
This repository has been archived
(exclude archived)
Created
2019-06-06
85 commits to master branch, last one about a year ago
👑 spaCy building blocks and visualizers for Streamlit apps
Created
2020-06-23
85 commits to master branch, last one about a year ago
All the slides, accompanying code and exercises all stored in this repo. 🎈
Created
2018-02-07
280 commits to master branch, last one about a year ago
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Created
2021-01-08
114 commits to master branch, last one 24 days ago
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...
Created
2017-02-07
77 commits to master branch, last one 2 years ago
Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing
Created
2018-03-13
157 commits to master branch, last one about a year ago
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
Created
2023-05-12
196 commits to main branch, last one 11 months ago
PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language
Created
2012-05-21
243 commits to master branch, last one about a year ago
ClangKit provides an Objective-C frontend to LibClang. Source tokenization, diagnostics and fix-its are actually implemented.
Created
2012-08-12
4 commits to main branch, last one 7 years ago
🎤 vibrato: Viterbi-based accelerated tokenizer
Created
2022-07-06
172 commits to main branch, last one about a month ago
Sudachi in Rust 🦀 and new generation of SudachiPy
Created
2019-11-23
404 commits to develop branch, last one 4 months ago
Fast and customizable text tokenization library with BPE and SentencePiece support
Created
2017-02-14
601 commits to master branch, last one about a year ago
CodeChain's official implementation in Rust.
Created
2018-01-23
3,775 commits to master branch, last one 3 years ago
OmniTokenizer: one model and one weight for image-video joint tokenization.
Created
2024-06-13
15 commits to main branch, last one 4 months ago
Rule-based token, sentence segmentation for Russian language
Created
2018-11-10
98 commits to master branch, last one about a year ago
TokenScript schema, specs and paper
Created
2019-02-06
563 commits to main branch, last one 4 months ago
🛥 Vaporetto: Very accelerated pointwise prediction based tokenizer
Created
2021-08-18
260 commits to main branch, last one 22 hours ago
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Created
2019-09-07
38 commits to master branch, last one about a year ago
[Paper][Preprint 2024] MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion
Created
2024-04-15
9 commits to main branch, last one 5 months ago
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand...
Created
2021-11-08
19 commits to main branch, last one 2 years ago
A unified tokenization tool for Images, Chinese and English.
Created
2021-12-22
14 commits to main branch, last one about a year ago
Simple NLP in Rust with Python bindings
Created
2018-11-05
143 commits to main branch, last one 4 years ago
Minimal, OpenSSL-less and super lightweight JWT library written in C.
Created
2020-01-15
309 commits to master branch, last one 14 days ago
Simple multilingual lemmatizer for Python, especially useful for speed and efficiency
Created
2021-01-18
230 commits to main branch, last one 14 hours ago
Fast bare-bones BPE for modern tokenizer training
Created
2023-11-09
52 commits to main branch, last one 16 days ago