Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Date Created 2022-09-26 (2 years ago)
Commits 1,613 (last one 6 days ago)
Stargazers 9,027 (20 this week)
Watchers 59 (0 this week)
Forks 743
License apache-2.0
Ranking

RepositoryStats indexes 579,238 repositories, of these Unstructured-IO/unstructured is ranked #4,470 (99th percentile) for total stargazers, and #34,088 for total watchers. Github reports the primary language for this repository as HTML, for repositories using this language it is ranked #92/15,064.

Unstructured-IO/unstructured is also tagged with popular topics, for these it's ranked: deep-learning (#175/8339),  machine-learning (#190/7866),  llm (#77/2654),  nlp (#45/2392),  natural-language-processing (#38/1402),  pdf (#27/985),  langchain (#11/629),  ml (#15/594),  ocr (#15/583),  information-retrieval (#8/213)

Other Information

Unstructured-IO/unstructured has 40 open pull requests on Github, 1,966 pull requests have been merged over the lifetime of the repository.

Github issues are enabled, there are 191 open issues and 930 closed issues.

There have been 157 releases, the latest one was published on 2024-10-31 (6 days ago) with the name 0.16.4.

Homepage URL: https://www.unstructured.io/

Star History

Github stargazers over time

Watcher History

Github watchers over time, collection started in '23

Recent Commit History

1,613 commits on the default branch (main) since jan '22

Yearly Commits

Commits to the default branch (main) per year

Issue History

Languages

The primary language is HTML but there's also others...

updated: 2024-11-07 @ 12:02am, id: 541798154 / R_kgDOIEsvCg