PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Ocr Python Packages

Python packages with the GitHub topic ocr. Sorted by relevance, with stars and monthly downloads.
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

78.7M 10K 718
run-llama
llama-cloud

Python SDK for OCR and document parsing in the cloud with LlamaParse

9.7M 28 7
Unstructured-IO
unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.3M 15K 1K
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.4M 10K 718
jaidedai
easyocr

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

2.8M 29K 4K
RapidAI
rapidocr

📄 Awesome OCR multiple programing languages toolkits based on ONNX Runtime, OpenVINO, MNN, PaddlePaddle, TensorRT and PyTorch.

2.3M 6K 630
PaddlePaddle
paddleocr

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

2.1M 77K 10K
RapidAI
rapidocr-onnxruntime

📄 Awesome OCR multiple programing languages toolkits based on ONNX Runtime, OpenVINO, MNN, PaddlePaddle, TensorRT and PyTorch.

1.1M 6K 630
ocrmypdf
ocrmypdf

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

842K 34K 2K
robocorp
rpaframework

Collection of open-source libraries and tools for Robotic Process Automation (RPA), designed to be used with both Robot Framework and Python

645K 1K 268
robocorp
rpaframework-core

Collection of open-source libraries and tools for Robotic Process Automation (RPA), designed to be used with both Robot Framework and Python

512K 1K 268
robocorp
rpaframework-pdf

Collection of open-source libraries and tools for Robotic Process Automation (RPA), designed to be used with both Robot Framework and Python

476K 1K 268
sirfz
tesserocr

A Python wrapper for the tesseract-ocr API

377K 2K 259
mindee
python-doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

292K 6K 641
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

283K 62K 5K
sml2h3
ddddocr

带带弟弟 通用验证码识别OCR pypi版

253K 14K 2K
Layout-Parser
layoutparser

A Unified Toolkit for Deep Learning Based Document Image Analysis

158K 6K 533
opendataloader-project
opendataloader-pdf

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

112K 20K 2K
felixdittrich92
onnxtr

OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for seamless, high-performing & accessible OCR

79K 178 18
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

76K 62K 5K
breezedeus
cnocr

CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scenarios and can be used directly after installation. 【基于 PyTorch/MXNet 的中文/英文 OCR Python 包。】

71K 4K 537
yobix-ai
extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

58K 2K 96
breezedeus
cnstd

CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包

55K 791 115
datafog
datafog

Python SDK for PII detection and redaction in text and images, combining regex + NLP pipelines for production privacy workflows.

53K 54 13
    • Data from PyPI, GitHub, ClickHouse, and BigQuery