109 results found Sort:

:zap: From finding text to search and replace, from sorting to beautifying text and more :art:
This repository has been archived (exclude archived)
Created 2017-03-25
364 commits to master branch, last one 6 months ago
1.1k
7.6k
apache-2.0
116
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
This repository has been archived (exclude archived)
Created 2018-01-23
44 commits to master branch, last one 5 years ago
546
6.0k
agpl-3.0
64
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Created 2012-10-06
2,718 commits to main branch, last one 3 days ago
140
6.0k
mit
28
Intuitive find & replace CLI (sed alternative)
Created 2018-12-23
320 commits to master branch, last one 7 months ago
448
3.1k
apache-2.0
82
fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.
Created 2018-03-07
2,484 commits to master branch, last one 2 years ago
282
2.2k
mit
27
Python library for creating PEG parsers
Created 2017-05-14
1,355 commits to master branch, last one 6 days ago
76
1.9k
mit
14
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library
Created 2024-11-01
323 commits to main branch, last one 14 days ago
Persian NLP Toolkit
Created 2013-10-29
1,411 commits to master branch, last one 6 months ago
Program to convert lines of text into a tree structure.
Created 2020-04-30
70 commits to master branch, last one 3 years ago
67
1.2k
apache-2.0
12
The most accurate natural language detection library for Go, suitable for short text and mixed-language text
Created 2020-11-27
108 commits to main branch, last one 4 days ago
97
1.0k
unlicense
19
A fast implementation of Aho-Corasick in Rust.
Created 2015-06-11
293 commits to master branch, last one 2 months ago
273
994
apache-2.0
46
Thai natural language processing in Python
Created 2016-06-23
4,833 commits to dev branch, last one 4 days ago
28
925
mpl-2.0
16
A fast and convenient fuzzy matcher library for rust
Created 2023-07-27
114 commits to master branch, last one 6 days ago
18
702
unlicense
8
A sharp cut(1) clone.
Created 2021-06-24
197 commits to master branch, last one 18 days ago
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashta...
Created 2017-02-07
77 commits to master branch, last one 2 years ago
A simple Python module for parsing human names into their individual components
Created 2014-04-02
409 commits to master branch, last one about a year ago
Natural language detection library for Go
Created 2017-02-20
34 commits to master branch, last one 5 years ago
71
635
apache-2.0
4
All-in-one text de-duplication
Created 2021-03-13
363 commits to main branch, last one 7 months ago
Open Korean Text Processor - An Open-source Korean Text Processor
Created 2017-01-24
799 commits to master branch, last one 9 months ago
69
495
apache-2.0
10
Text Normalization & Inverse Text Normalization
Created 2022-08-23
192 commits to master branch, last one about a month ago
68
479
gpl-3.0
31
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks su...
Created 2010-07-06
2,160 commits to master branch, last one about a year ago
85
451
gpl-3.0
37
pyarabic
Created 2014-02-17
160 commits to master branch, last one 11 months ago
117
405
gpl-3.0
8
Automatic Korean word spacing with Python
Created 2018-04-19
87 commits to master branch, last one 5 months ago
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Created 2018-08-08
15 commits to master branch, last one 6 years ago
Turn PDFs and EPUBs into audiobooks, subtitles or videos into dubbed videos (including translation), and more. For free. Pandrator uses local models, notably XTTS, including voice-cloning (instant, RV...
Created 2024-03-20
315 commits to main branch, last one about a month ago
A low level regular expression library that uses deterministic finite automata.
This repository has been archived (exclude archived)
Created 2019-01-04
95 commits to master branch, last one about a year ago
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
Created 2016-04-02
132 commits to master branch, last one 4 months ago
45
308
other
22
Fast and portable character string processing in R (with the Unicode ICU)
Created 2013-01-05
1,684 commits to master branch, last one 5 months ago