Dependents of pdfminer-six

336 dependents

Package	Description	Downloads/month
pdfplumber	Plumb a PDF for detailed information about each char, rectangle, line, et cetera...	27.6M
openhands-aci	An Agent-Computer Interface (ACI) designed for software development agents OpenH...	1.4M
polyfile-weave	A pure Python cleanroom implementation of libmagic, with instrumented parsing fr...	1.2M
camelot-py	A Python library to extract tabular data from PDFs	851K
ocrmypdf	OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be search...	835K
rpaframework-pdf	Collection of open-source libraries and tools for Robotic Process Automation (RP...	471K
textract	extract text from any document. no muss. no fuss.	381K
mineru	Transforms complex documents like PDFs and Office docs into LLM-ready markdown/J...	282K
openviking	OpenViking is an open-source context database designed specifically for AI Agent...	211K
typecode	TypeCode provides comprehensive filetype and mimetype detection using multiple d...	81K
magic-pdf	Transforms complex documents like PDFs and Office docs into LLM-ready markdown/J...	77K
scancode-toolkit	:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ......	74K
textract-py3	Maintained fork of deanmalmgren/textract to replace '*' dependencies and other u...	53K
docassemble-base	A free, open-source expert system for guided interviews and document assembly, b...	46K
auto-coder	AutoCoder: AutoCoder	37K
markitdown-ocr	Python tool for converting files and office documents to Markdown.	29K
docassemble-webapp	A free, open-source expert system for guided interviews and document assembly, b...	21K
llamabot	Pythonic class-based interface to LLMs	17K
pdf2zh	[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 A...	17K
restai-core	RESTAI, so many 'A's and 'I's, so little time...	17K
pdftitle	a utility to extract the title from a PDF file	14K
aient	Aient: The Awakening of Agent.	14K
casparser	Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfint...	14K
arxiv2text	Converting PDF files to text, mainly with a focus on arXiv papers.	13K
polyfile	A pure Python cleanroom implementation of libmagic, with instrumented parsing fr...	13K
pypdf-table-extraction	A Python library to extract tabular data from PDFs	12K
aiecs	AI Execute Services - A middleware framework for AI-powered task execution and t...	9K
modelmerge	modelmerge is a multi-large language model API aggregator.	9K
beswarm	MAS	9K
pdfannots	Extracts and formats text annotations from a PDF file	8K
mtxai	A web scraping library based on LangChain which uses LLM and direct graph logic ...	8K
pyresparser	A simple resume parser used for extracting information from resumes	7K
credsweeper	CredSweeper is a tool to detect credentials in any directories or files. CredSwe...	6K
botrun-flow-lang	A flow language for botrun	6K
predacore	PredaCore — the apex autonomous agent. Hybrid Rust memory kernel, topped LongMem...	6K
gulagcleaner	Ad removal tool for PDFs.	6K
gecko-core	Builder Bootstrap Platform SDK — pure business logic. CLI, MCP, and API import t...	6K
micro-cc	Harness that gives frontier models full system access — shell, filesystem, brows...	6K
iocsearcher	A library and command line tool for extracting indicators of compromise (IOCs) f...	5K
papers-dl	A command line application for downloading scientific papers.	5K
foundry-mcp	foundry-mcp	5K
topicexplorer	InPhO Topic Explorer	5K
scancode-toolkit-mini	:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ......	5K
openparse	Improved file parsing for LLM’s	4K
reviewboardpowerpack	Enhances Review Board with PDF review and diffing, reports and analytics, new so...	4K
organize-tool	The file management automation tool.	4K
pdf2txt	A better pdf to text extraction toolkit	4K
biolit	LLM-assisted biomedical literature screening and structured extraction. Supports...	4K
protollm	Framework for prototyping of LLM-based applications	4K
llmvm-cli	LLM <-> Python agentic runtime prototype	3K