Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Date Created 2022-09-26 (2 years ago)
Commits 1,706 (last one 3 days ago)
Stargazers 10,637 (83 this week)
Watchers 69 (0 this week)
Forks 884
License apache-2.0
Ranking

RepositoryStats indexes 630,443 repositories, of these Unstructured-IO/unstructured is ranked #3,882 (99th percentile) for total stargazers, and #28,291 for total watchers. Github reports the primary language for this repository as HTML, for repositories using this language it is ranked #82/16,460.

Unstructured-IO/unstructured is also tagged with popular topics, for these it's ranked: deep-learning (#152/8798),  machine-learning (#167/8384),  llm (#89/3538),  nlp (#36/2516),  natural-language-processing (#33/1470),  pdf (#28/1072),  langchain (#15/753),  ml (#14/653),  ocr (#16/642),  information-retrieval (#7/230)

Other Information

Unstructured-IO/unstructured has 49 open pull requests on Github, 2,059 pull requests have been merged over the lifetime of the repository.

Github issues are enabled, there are 151 open issues and 1,037 closed issues.

There have been 179 releases, the latest one was published on 2025-03-20 (4 days ago) with the name 0.17.2.

Homepage URL: https://www.unstructured.io/

Star History

Github stargazers over time

12k12k10k10k8k8k6k6k4k4k2k2k0020232023Jul '23Jul '2320242024Jul '24Jul '2420252025

Watcher History

Github watchers over time, collection started in '23

70706060505040403030202010100020232023Jul '23Jul '2320242024Jul '24Jul '2420252025

Recent Commit History

1,706 commits on the default branch (main) since jan '22

1.8k1.8k1.6k1.6k1.4k1.4k1.2k1.2k1k1k8008006006004004002002000020232023Jul '23Jul '2320242024Jul '24Jul '2420252025

Yearly Commits

Commits to the default branch (main) per year

600600500500400400300300200200100100002022202220242024

Issue History

Total Issues
Open Issues
Closed Issues
1.2k1.2k1k1k8008006006004004002002000020232023Jul '23Jul '2320242024Jul '24Jul '2420252025

Languages

The primary language is HTML but there's also others...

HTMLHTMLPythonPythonShellShellRich Text FormatRich Text FormatMakefileMakefileDockerfileDockerfileXSLTXSLTGoGo

updated: 2025-03-24 @ 11:39pm, id: 541798154 / R_kgDOIEsvCg