Statistics for topic ocr
RepositoryStats tracks 604,717 Github repositories, of these 619 are tagged with the ocr topic. The most common primary language for repositories using this topic is Python (298). Other languages include: Jupyter Notebook (43), C++ (42), TypeScript (32), Java (28), JavaScript (27), C# (26), HTML (11), Swift (11)
Stargazers over time for topic ocr
Most starred repositories for topic ocr (view more)
Trending repositories for topic ocr (view more)
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
Open-source document processing platform built for knowledge workers
快如闪电的硬字幕提取工具。仅需苹果M1芯片或英伟达3060显卡即可达到10倍速提取。A very fast tool for video hardcode subtitle extraction
Docker Image with latest Tesseract OCR Version 5.x.x built from sources
Open-source document processing platform built for knowledge workers
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSO...
快如闪电的硬字幕提取工具。仅需苹果M1芯片或英伟达3060显卡即可达到10倍速提取。A very fast tool for video hardcode subtitle extraction
OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR
Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSO...
Docker Image with latest Tesseract OCR Version 5.x.x built from sources
Open-source document processing platform built for knowledge workers
Python3 package for Chinese/English OCR, with paddleocr-v4 onnx model(~14MB). 基于ppocr-v4-onnx模型推理,可实现 CPU 上毫秒级的 OCR 精准预测,通用场景中英文OCR达到开源SOTA。
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
Python3 package for Chinese/English OCR, with paddleocr-v4 onnx model(~14MB). 基于ppocr-v4-onnx模型推理,可实现 CPU 上毫秒级的 OCR 精准预测,通用场景中英文OCR达到开源SOTA。
快如闪电的硬字幕提取工具。仅需苹果M1芯片或英伟达3060显卡即可达到10倍速提取。A very fast tool for video hardcode subtitle extraction
ocr-docker is small, Flask powerd web app, helps us to extract text from images and pdf document using OCR
身份证OCR智能识别、证件提取以及验证码自动化解析功能,项目核心基于深度学习技术。模型、数据集、finetune和API支持,欢迎自取使用,并持续关注我输出的更多模型。V:chenganp
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSO...
OpenRecall is a fully open-source, privacy-first alternative to proprietary solutions like Microsoft's Windows Recall. With OpenRecall, you can easily access your digital history, enhancing your memor...
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
A privacy-first, self-hosted, fully open source personal knowledge management software, written in typescript and golang.
OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and de...
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSO...
Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
OpenRecall is a fully open-source, privacy-first alternative to proprietary solutions like Microsoft's Windows Recall. With OpenRecall, you can easily access your digital history, enhancing your memor...
AI-powered organization and chat assistant for Obsidian