PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

77.9M 10K 718
jsvine
pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

27.6M 10K 878
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.3M 10K 718
kreuzberg-dev
kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

165K 8K 472
xavctn
img2table

img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing

108K 865 120
harubi
bolivar

High-performance PDF table extraction library. Bindings for Python and JVM.

14K 1 0
ExtractTable
extracttable

Python library to extract tabular data from images and scanned PDFs

5K 286 35
tiroq
mdify-cli

MDify is a document-to-Markdown conversion library for extracting structured content from complex PDFs and document images, including tables, charts, and scanned documents.

3K 0 0
monchin
tablers

A blazingly fast PDF table extraction library with python API powered by Rust

1K 9 1
Ganymede-Bio
gridgulp

Automatically detect and extract tables from Excel, CSV, and text files.

962 12 1
kensho-technologies
grits-metric

GriTS metric for table extraction

448 2 0
nanonets
docext

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

431 2K 143
jsvine
pdfplumber-aemc

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

397 10K 878
Kyros-Groupe-Ltd
pdfstructx

Intelligent PDF parser with font-aware structure detection, table extraction, and multi-column support

227 0 0
meldonization
depdf

PDF table & paragraph extractor

222 11 0
inquilabee
tablecv

TableCV: Table extraction from images made easy.

173 11 2
Goldziher
mseep-kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

162 8K 477
pymupdf
aqpymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

161 10K 718
ZhuJiaxin2
ragtable-extract

PDF table extraction for RAG and LLM — convert PDF tables to clean HTML. Fast, local, no GPU. Handles merged cells, line-wrapped text, no serialization.

125 1 0
philgooch
pdftablr

A fork of Kyle Cronan's Python 2.5 pdftable library, now updated for Python 3

122 2 0
sergiocorreia
quipucamayoc

dev repo for article

119 33 5
    • Data from PyPI, GitHub, ClickHouse, and BigQuery