26 results found Sort:

447
1.9k
apache-2.0
58
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Created 2017-12-06
127 commits to master branch, last one 4 years ago
118
1.5k
unknown
80
Content-Addressable Data Synchronization Tool
Created 2017-01-13
684 commits to main branch, last one about a year ago
360
400
other
40
An extensible Java framework for building event-driven applications that break up XML and non-XML data into chunks for data integration
Created 2010-12-23
854 commits to master branch, last one about a month ago
45
349
bsd-3-clause
15
Alternative casync implementation
Created 2017-11-09
309 commits to master branch, last one 8 days ago
10
325
mit
5
Fully neural approach for text chunking
Created 2025-04-08
24 commits to main branch, last one 5 days ago
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
Created 2023-11-05
148 commits to main branch, last one 28 days ago
A package for parsing PDFs and analyzing their content using LLMs.
Created 2024-07-26
28 commits to main branch, last one 8 months ago
The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.
Created 2023-09-25
631 commits to development branch, last one 23 days ago
11
177
unknown
1
A new chunking strategy developed by ZeroEntropy for general semantic chunking using Llama-70B.
Created 2024-08-04
12 commits to master branch, last one 2 months ago
Live TS segmenter and HLS manifest creation in Go
Created 2019-05-31
115 commits to master branch, last one 3 years ago
An LLM GUI application; enables you to interact with your files, offering dynamic parameters that can modify response behavior during runtime.
Created 2023-05-09
88 commits to main branch, last one about a year ago
44
88
apache-2.0
5
a modular multimodal framework for ai applications
Created 2024-04-08
4,909 commits to master branch, last one 22 days ago
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
Created 2024-02-27
125 commits to main branch, last one about a month ago
📑 Split Laravel jobs into multiple separate job chunks
Created 2022-09-18
50 commits to v1 branch, last one 11 months ago
22
82
apache-2.0
2
An asynchronous event-driven HTTP client based on netty.
Created 2020-12-08
203 commits to main branch, last one 2 years ago
Postgres extensions to support end-to-end Retrieval-Augmented Generation (RAG) pipelines
Created 2024-09-06
85 commits to main branch, last one 18 hours ago
smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From R...
Created 2025-02-13
42 commits to main branch, last one 2 months ago
11
64
other
7
Labelling Sequential Data in Natural Language Processing with R - using CRFsuite
Created 2018-08-17
139 commits to master branch, last one about a year ago
FastCDC implementation in Python https://pypi.org/project/fastcdc/
Created 2020-05-07
105 commits to master branch, last one 10 months ago
Incremental asset delivery library
Created 2019-07-04
603 commits to main branch, last one 5 months ago
Extract and align grammar patterns from English sentences.
Created 2018-06-07
13 commits to master branch, last one 5 years ago
Build document-native LLM applications
This repository has been archived (exclude archived)
Created 2024-07-30
20 commits to main branch, last one 7 months ago
LLM Chatbot w/ Retrieval Augmented Generation using Llamaindex. It demonstrates how to impl. chunking, indexing, and source citation.
Created 2023-09-12
64 commits to main branch, last one about a year ago
DocumentAtom provides a light, fast library for breaking input documents into constituent parts (atoms), useful for text processing, analysis, and artificial intelligence.
Created 2024-12-30
45 commits to main branch, last one 16 days ago
An Overview of the Latest Document Chunking Research
Created 2024-11-19
21 commits to main branch, last one 5 months ago
BetterHTMLChunking is a Python library for intelligent HTML segmentation. It builds a DOM tree from raw HTML and extracts content-rich regions of interest, making content analysis effortless. Great fo...
Created 2025-02-14
8 commits to main branch, last one 27 days ago