PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

77.9M 10K 718
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.3M 10K 718
ocrmypdf
ocrmypdf

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

835K 34K 2K
sirfz
tesserocr

A Python wrapper for the tesseract-ocr API

365K 2K 259
kreuzberg-dev
kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

165K 8K 472
tebelorg
tagui

Python package for doing RPA

12K 5K 724
tebelorg
rpa

Python package for doing RPA

11K 5K 724
xiahongze
pysseract

Python binding to Tesseract 4.0 API

4K 1 1
sivakumar-mahalingam
fastmrz

⚡Extracting the Machine Readable Zone (MRZ) from passport or any document images

4K 176 39
amenezes
aiopytesseract

A Python asyncio wrapper for Tesseract-OCR.

2K 27 7
vietanhdev
anyocr

A lightweight, unified OCR toolkit with a one-liner API. Supports Surya, EasyOCR, PaddleOCR, Tesseract, and Vision LLMs through a single interface.

2K 0 0
wolfmanstout
screen-ocr

Easily perform OCR on portions of the screen, choosing from a selection of backends.

697 50 7
Lucs1590
nkocr

This is a module to make specifics OCRs at food products and nutricional tables.

628 39 11
icaropires
pdf2dataset

Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features

596 19 5
fullstackcrew-alpha
privacy-mask

Automatically redacts sensitive data in screenshots before sending to AI agents

594 9 1
AxaFrance
axa-fr-ocr

AXA France OCR library

462 3 0
nometria
medical-ocr

Multi-engine OCR pipeline for medical and legal documents

419 1 0
engeir
northern-lights-forecast

A northern lights forecast that automatically send a telegram notification during substorm events.

389 1 0
pasteurlabs
tesseract-jax

Execute + differentiate Tesseracts as part of JAX programs, with full support for function transformations like JIT, grad, and more. ⚡

357 31 3
egemenzeytinci
readmrz

Machine readable zone reader on ID cards

346 19 10
StabRise
pyspark-pdf

PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it

291 81 4
bandrel
ocyara

A Yara rule engine that scans images for matches using Optical Character Recognition (OCR). See the Github page for more information about the Cython, Tesseract, and Leptonica prerequsites.

267 42 8
hansalemaos
multitessiocr

Performs a very fast OCR on a list of images (file path, url, base64, bytes, numpy, PIL ...) using Tesseract and returns the recognized text, its coordinates, and line-based word grouping in a DataFrame.

243 0 0
hansalemaos
a-pandas-ex-tesseract-multirow-regex-fuzz

Regex/Fuzz search across multiple rows/Tesseract to pandas.DataFrame

239 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery