PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Pdf Python Packages

Python packages with the GitHub topic pdf. Sorted by relevance, with stars and monthly downloads.
pymupdf
pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

78.5M 10K 718
py-pdf
pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

57.3M 10K 2K
pdfminer
pdfminer-six

Community maintained fork of pdfminer - we fathom PDF

41.7M 7K 1K
pypdfium2-team
pypdfium2

Python bindings to PDFium, reasonably cross-platform.

37.7M 759 43
jsvine
pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.

28.1M 10K 878
py-pdf
pypdf2

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

25.3M 10K 2K
Kozea
weasyprint

The awesome document factory

24.5M 9K 807
lukasschwab
arxiv

Python wrapper for the arXiv API

19.6M 1K 153
CourtBouillon
pydyf

A low-level PDF creator

19.6M 144 11
Belval
pdf2image

A python module that wraps the pdftoppm utility to convert PDF to PIL Image object

15.8M 2K 212
Kozea
cairosvg

Convert your vector images

14.1M 918 160
py-pdf
fpdf2

Simple PDF generation for Python

9.6M 2K 341
JessicaTegner
pypandoc-binary

Thin wrapper for "pandoc" (MIT)

8.2M 1K 120
pikepdf
pikepdf

A Python library for reading and writing PDF, powered by QPDF

8.2M 3K 223
docling-project
docling

Get your documents ready for gen AI

6M 59K 4K
MatthiasValvekens
pyhanko

pyHanko: sign and stamp PDF files

5.9M 715 101
microsoft
markitdown

Python tool for converting files and office documents to Markdown.

5.3M 121K 8K
Unstructured-IO
unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.3M 15K 1K
JessicaTegner
pypandoc

Thin wrapper for "pandoc" (MIT)

4.9M 1K 120
deeplook
svglib

Read SVG files and convert them to other formats.

4.5M 362 85
pymupdf
pymupdfb

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

4.4M 10K 718
MatthiasValvekens
pyhanko-certvalidator

pyHanko: sign and stamp PDF files

4.3M 715 101
xhtml2pdf
xhtml2pdf

A library for converting HTML into PDFs using ReportLab

3.5M 2K 655
chezou
tabula-py

Simple wrapper of tabula-java: extract table from PDF into pandas DataFrame

2.3M 2K 304
    • Data from PyPI, GitHub, ClickHouse, and BigQuery