PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Pdf Processing Python Packages

Python packages with the GitHub topic pdf-processing. Sorted by relevance, with stars and monthly downloads.
pdftl
pdftl

PDF CLI pipeline: merge, split, crop, rotate, compress, extract images, add text and more. Modern pdftk replacement, powered by pikepdf/qpdf.

4K 6 1
mcagriaksoy
safepdf

A safe PDF manipulation tool

445 6 1
hksorensen
diagram-detector

Production-ready diagram detection for academic papers using YOLO11

378 1 0
Kubenew
pdf2struct

`pdf2struct` extracts structured JSON from PDF documents.

363 1 0
Rekhet
revpdf

A triage-and-recovery toolkit for PDFs saved with incremental updates.

351 0 0
fujiba
llm-pdf-chunker

LLM-friendly PDF splitter & image optimizer. Chunk PDFs by size and downsample images for RAG/Bedrock.

329 0 0
PSPDFKit
nutrient-dws

Official Python client library for Nutrient Document Web Services API - PDF processing, OCR, watermarking, and document manipulation with automatic Office format conversion

276 54 1
Prathamesh-Ghatole
entityxtract

A provider-agnostic, entity-centric LLM-powered document entity extraction tool

226 1 1
MelinaNorton
journal-vetter

Python CLI & library for automated journal vetting — GPT‑4.1 summarization, YAML configuration, reproducible analysis.

151 1 0
Aleptonic
pdf-snip

A package to help manage pdf pages, images and their conversions during different NLP, CV or other tasks to avoid repetitive code blocks and give a simple function call to make it happen

107 3 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery