Dependents of pypdfium2

203 dependents

Package	Description	Downloads/month
pdfplumber	Plumb a PDF for detailed information about each char, rectangle, line, et cetera...	28.1M
unstructured-client	A Python client for the Unstructured Platform API	9.5M
unstructured-inference	A library for performing inference using trained models.	1.1M
camelot-py	A Python library to extract tabular data from PDFs	866K
ocrmypdf	OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be search...	842K
surya-ocr	OCR, layout analysis, reading order, table recognition in 90+ languages	799K
pdftext	Extract structured text from pdfs quickly	401K
python-doctr	docTR (Document Text Recognition) - a seamless, high-performing & accessible lib...	289K
mineru	Transforms complex documents like PDFs and Office docs into LLM-ready markdown/J...	282K
abstra	Abstra Lib	104K
onnxtr	OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for s...	78K
nv-ingest-api	NeMo Retriever Library is a scalable, performance-oriented document content and ...	74K
keep-skill	Reflective memory for AI agents	65K
graphon	Graph execution engine for agentic AI workflows.	58K
olmocr	Toolkit for linearizing PDFs for LLM datasets/training	41K
mindee	Mindee API Helper Library for Python	31K
nv-ingest	NeMo Retriever Library is a scalable, performance-oriented document content and ...	31K
doc2dict		30K
texify	Math OCR model that outputs LaTeX and markdown	21K
zebrafy	Python library for converting PDF and images to and from Zebra Programming Langu...	21K
chandra-ocr	OCR model that converts documents to markdown, HTML, or JSON.	20K
setiastrosuitepro	Seti Astro Suite Pro	19K
gmft	Lightweight, performant, deep table extraction	13K
regscale-cli	Command Line Interface (CLI) for bulk processing/loading data into RegScale	13K
nemo-retriever	A modern RAG ingestion pipeline from Nvidia	12K
pipelex	Declarative language for composable Al workflows. Devtool for agents and mere hu...	11K
tabled-pdf	Detect and extract tables to markdown and csv	10K
vibe-trading-ai	"Vibe-Trading: Your Personal Trading Agent"	10K
pixeltable	Data Infrastructure providing a declarative, incremental approach for multimodal...	10K
nougat-ocr	Implementation of Nougat Neural Optical Understanding for Academic Documents	8K
gecko-core	Builder Bootstrap Platform SDK — pure business logic. CLI, MCP, and API import t...	7K
mm-ctx	Fast, multimodal context for agents.	7K
yomitoku	YomiToku is an AI-powered document image analysis package designed specifically ...	7K
extract-thinker	Library to extract data from files and documents agnositicaly using LLMs	6K
chunking-pdftext	Extract structured text from pdfs quickly	6K
formalpdf	An Apache-licensed package for extracting, creating, filling, and flattening PDF...	6K
genai-processors	GenAI Processors is a lightweight Python library that enables efficient, paralle...	5K
attachments	Easiest way to give context to LLMs; Attachments has the ambition to be the gene...	5K
graphon-local	Graph execution engine for agentic AI workflows.	5K
paradox-pdf	Structured text extraction framework for digital and scanned PDFs with inline fo...	4K
megaparse		4K
elemental-ingest	Extract structural RC models from IFC files into the elemental-engine JSON contr...	4K
iscc-sdk	SDK for creating ISCCs (International Standard Content Codes)	3K
llmvm-cli	LLM <-> Python agentic runtime prototype	3K
ibm-watsonx-data-intelligence-mcp-server	data-intelligence-mcp-server is a centralized Model Context Protocol (MCP) serve...	3K
kiln-ai	Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuni...	3K
xursparks	Encapsulating Apache Spark for Easy Usage	2K
groundmark	Markdown grounded to PDF bounding boxes via VLM + Smith-Waterman alignment	2K
mare-retrieval	Modality-Aware Retrieval Engine inspired by IRPAPERS-style multimodal retrieval	2K
xurpas-data-quality	XAIL Data quality	2K