Pdf Parsing Python Packages

pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

56.5M 10K 2K

pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

27.6M 10K 878

pypdf2

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

25.2M 10K 2K

py-pdf-parser

A Python tool to help extracting information from structured PDFs.

28K 430 49

bolivar

High-performance PDF table extraction library. Bindings for Python and JVM.

14K 1 0

citracer

💬 Trace citation chains for any concept across research papers and render them as an interactive graph.

5K 19 0

preocr

Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.

3K 10 4

iqdmpdf

A collection of PDF data mining scripts for various IMRT QA vendors

1K 13 2

flash-mineru

Fast Inference Architecture for MinerU

793 49 7

pypdf-fork

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

449 10K 2K

pdfplumber-aemc

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

397 10K 878

depdf

PDF table & paragraph extractor

222 11 0

literature-mapper

Transform academic PDFs into a Knowledge Graph with typed claims, temporal analysis, bibliometric tools, and grounded LLM synthesis that cites only your corpus.

175 8 3

pdf4py

A PDF parser written in Python3 with no external dependencies.

173 56 3

pdf-bank-statement-parser

Command-line tool for converting PDF bank statements into CSV

164 6 5

refchaser

Written in python, for checking reference lists in systematic reviews and literature reviews, helps with reference list searching both backward & forward by extracting references and creating search queries, ranks articles by relevance to improve screening efficiency, downloads full-text pdf of research articles in batch.

138 25 2

ragtable-extract

PDF table extraction for RAG and LLM — convert PDF tables to clean HTML. Fast, local, no GPU. Handles merged cells, line-wrapped text, no serialization.

125 1 0

Search Packages