Table Extraction Python Packages

pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

77.9M 10K 718

pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

27.6M 10K 878

pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.3M 10K 718

kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

165K 8K 472

img2table

img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing

108K 865 120

bolivar

High-performance PDF table extraction library. Bindings for Python and JVM.

14K 1 0

extracttable

Python library to extract tabular data from images and scanned PDFs

5K 286 35

mdify-cli

MDify is a document-to-Markdown conversion library for extracting structured content from complex PDFs and document images, including tables, charts, and scanned documents.

3K 0 0

tablers

A blazingly fast PDF table extraction library with python API powered by Rust

1K 9 1

gridgulp

Automatically detect and extract tables from Excel, CSV, and text files.

962 12 1

grits-metric

GriTS metric for table extraction

448 2 0

docext

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

431 2K 143

pdfplumber-aemc

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

397 10K 878

pdfstructx

Intelligent PDF parser with font-aware structure detection, table extraction, and multi-column support

227 0 0

depdf

PDF table & paragraph extractor

222 11 0

tablecv

TableCV: Table extraction from images made easy.

173 11 2

mseep-kreuzberg

162 8K 477

aqpymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

161 10K 718

ragtable-extract

PDF table extraction for RAG and LLM — convert PDF tables to clean HTML. Fast, local, no GPU. Handles merged cells, line-wrapped text, no serialization.

125 1 0

pdftablr

A fork of Kyle Cronan's Python 2.5 pdftable library, now updated for Python 3

122 2 0

quipucamayoc

dev repo for article

119 33 5

Search Packages