PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
py-pdf
pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

56.5M 10K 2K
py-pdf
pypdf2

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

25.2M 10K 2K
PaddlePaddle
paddleocr

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

2M 77K 10K
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

282K 62K 5K
opendataloader-project
opendataloader-pdf

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

105K 20K 2K
yfedoseev
pdf-oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

81K 717 78
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

77K 62K 5K
yobix-ai
extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

63K 2K 96
run-llama
liteparse

A fast, helpful, and open-source document parser

30K 5K 326
bzsanti
oxidize-pdf

Python bindings for oxidize-pdf — generate, parse, split, merge & manipulate PDFs with native Rust performance. No C deps, no Java, no subprocesses.

19K 0 0
codereverser
casparser

Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfintech

14K 194 79
michelcrypt4d4mus
pdfalyzer

Analyze PDFs with colors (and YARA)

4K 366 25
opendataloader-project
langchain-opendataloader-pdf

A LangChain integration for OpenDataLoader PDF

3K 32 3
opendatalab
mineru-selfhosted-mcp

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

3K 62K 5K
titipata
scipdf-parser

Python PDF parser for scientific publications: content and figures

3K 452 65
axzml
pdfmark-ai

Convert PDF files to high-quality Markdown using LLM vision models

3K 0 0
nordinz7
maybankpdf2json

A package for extracting JSON data from Maybank PDF account statements

2K 1 0
ispras
dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

2K 661 52
nanonets
docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

2K 1K 129
ashutoshvarma
pyxpdf

Fast and memory-efficient Python PDF Parser based on xpdf sources

2K 44 17
iamarunbrahma
vision-parse

Parse PDF documents into markdown formatted content using Vision LLMs

2K 469 66
ENDEVSOLS
longparser

Privacy-first document intelligence engine — converts PDFs, DOCX, PPTX, XLSX, and CSV into AI-ready Markdown + structured JSON for RAG pipelines.

2K 15 1
AdemBoukhris457
doctra

Parse, extract, and analyze documents with ease

1K 204 33
madhav921
stmtforge

Open-source Python tool to parse credit card PDF statements from Indian banks (HDFC, ICICI, SBI, Axis + 5 more) into structured data. Offline, privacy-first, Streamlit dashboard. pip install stmtforge

698 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery