107 results found Sort:

:zap: From finding text to search and replace, from sorting to beautifying text and more :art:
This repository has been archived (exclude archived)
Created 2017-03-25
364 commits to master branch, last one 5 months ago
1.1k
7.5k
apache-2.0
116
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
This repository has been archived (exclude archived)
Created 2018-01-23
44 commits to master branch, last one 5 years ago
140
5.9k
mit
28
Intuitive find & replace CLI (sed alternative)
Created 2018-12-23
320 commits to master branch, last one 6 months ago
528
5.7k
agpl-3.0
64
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Created 2012-10-06
2,666 commits to main branch, last one 2 days ago
448
3.1k
apache-2.0
82
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Created 2018-03-07
2,484 commits to master branch, last one about a year ago
280
2.2k
mit
27
Python library for creating PEG parsers
Created 2017-05-14
1,351 commits to master branch, last one 2 days ago
50
1.4k
mit
12
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Created 2024-11-01
189 commits to main branch, last one a day ago
Persian NLP Toolkit
Created 2013-10-29
1,411 commits to master branch, last one 5 months ago
Program to convert lines of text into a tree structure.
Created 2020-04-30
70 commits to master branch, last one 2 years ago
66
1.2k
apache-2.0
12
The most accurate natural language detection library for Go, suitable for short text and mixed-language text
Created 2020-11-27
105 commits to main branch, last one 3 months ago
96
1.0k
unlicense
19
A fast implementation of Aho-Corasick in Rust.
Created 2015-06-11
293 commits to master branch, last one about a month ago
274
987
apache-2.0
46
Thai natural language processing in Python
Created 2016-06-23
4,785 commits to dev branch, last one a day ago
28
895
mpl-2.0
17
A fast and convenient fuzzy matcher library for rust
Created 2023-07-27
98 commits to master branch, last one 8 days ago
19
699
unlicense
8
A sharp cut(1) clone.
Created 2021-06-24
192 commits to master branch, last one 5 months ago
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...
Created 2017-02-07
77 commits to master branch, last one 2 years ago
A simple Python module for parsing human names into their individual components
Created 2014-04-02
409 commits to master branch, last one about a year ago
Natural language detection library for Go
Created 2017-02-20
34 commits to master branch, last one 5 years ago
71
622
apache-2.0
4
All-in-one text de-duplication
Created 2021-03-13
363 commits to main branch, last one 6 months ago
Open Korean Text Processor - An Open-source Korean Text Processor
Created 2017-01-24
799 commits to master branch, last one 8 months ago
69
481
apache-2.0
10
Text Normalization & Inverse Text Normalization
Created 2022-08-23
192 commits to master branch, last one 10 days ago
67
479
gpl-3.0
31
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...
Created 2010-07-06
2,160 commits to master branch, last one about a year ago
85
451
gpl-3.0
36
pyarabic
Created 2014-02-17
160 commits to master branch, last one 10 months ago
118
403
gpl-3.0
8
Automatic Korean word spacing with Python
Created 2018-04-19
87 commits to master branch, last one 4 months ago
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Created 2018-08-08
15 commits to master branch, last one 6 years ago
A low level regular expression library that uses deterministic finite automata.
This repository has been archived (exclude archived)
Created 2019-01-04
95 commits to master branch, last one about a year ago
Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RV...
Created 2024-03-20
315 commits to main branch, last one 9 days ago
17
317
mit
8
SQL Swiss Army Knife - Engine for Diverse Data Sources
Created 2018-01-29
529 commits to master branch, last one 2 days ago
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
Created 2016-04-02
132 commits to master branch, last one 3 months ago
44
304
other
22
Fast and portable character string processing in R (with the Unicode ICU)
Created 2013-01-05
1,684 commits to master branch, last one 4 months ago