PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
py-pdf
pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

56.5M 10K 2K
jsvine
pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

27.6M 10K 878
py-pdf
pypdf2

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

25.2M 10K 2K
jstockwin
py-pdf-parser

A Python tool to help extracting information from structured PDFs.

28K 430 49
harubi
bolivar

High-performance PDF table extraction library. Bindings for Python and JVM.

14K 1 0
marcpinet
citracer

💬 Trace citation chains for any concept across research papers and render them as an interactive graph.

5K 19 0
yuvaraj3855
preocr

Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.

3K 10 4
IQDM
iqdmpdf

A collection of PDF data mining scripts for various IMRT QA vendors

1K 13 2
OpenDCAI
flash-mineru

Fast Inference Architecture for MinerU

793 49 7
py-pdf
pypdf-fork

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

449 10K 2K
jsvine
pdfplumber-aemc

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

397 10K 878
meldonization
depdf

PDF table & paragraph extractor

222 11 0
jeremiahbohr
literature-mapper

Transform academic PDFs into a Knowledge Graph with typed claims, temporal analysis, bibliometric tools, and grounded LLM synthesis that cites only your corpus.

175 8 3
Halolegend94
pdf4py

A PDF parser written in Python3 with no external dependencies.

173 56 3
J-sephB-lt-n
pdf-bank-statement-parser

Command-line tool for converting PDF bank statements into CSV

164 6 5
DQ-Zhang
refchaser

Written in python, for checking reference lists in systematic reviews and literature reviews, helps with reference list searching both backward & forward by extracting references and creating search queries, ranks articles by relevance to improve screening efficiency, downloads full-text pdf of research articles in batch.

138 25 2
ZhuJiaxin2
ragtable-extract

PDF table extraction for RAG and LLM — convert PDF tables to clean HTML. Fast, local, no GPU. Handles merged cells, line-wrapped text, no serialization.

125 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery