Pdf To Text Python Packages

docling

Get your documents ready for gen AI

6M 59K 4K

unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.2M 15K 1K

docling-slim

Get your documents ready for gen AI

206K 59K 4K

pdf-oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

81K 717 78

unstructured-cpu

3K 15K 1K

olgadoc

Python bindings for Olga. PDF, DOCX, XLSX, HTML → Markdown and typed JSON, 15–40× faster than equivalent-quality OSS. Strictly-typed surface, no Any, one abi3 wheel for CPython 3.8+.

2K 6 0

churro-ocr

CHURRO is an OCR toolkit for historical document transcription, built to make handwritten and printed sources readable at high accuracy and lower cost.

1K 38 4

markdrop

A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered features for image and table analysis. Supports local files and URLs, preserves document structure, extracts high-quality images, detects tables using advanced ML models, and generates detailed content descriptions using multiple LLM providers including OpenAI GPT-4o, Google Gemini, Anthropic Claude, Groq, OpenRouter, and LiteLLM.

974 202 18

tikara

The metadata and text content extractor for almost every file type.

537 9 0

bangla-pdf-ocr

Bangla PDF to text converter that works on Windows, macOS, and Linux without any extra downloads or configurations.

227 21 3

docling-google-ocr

Get your documents ready for gen AI

179 59K 4K

pcu-io

IO management for PCU project

106 0 0

pcu-pdf

PDF parser component (Apache Tika) for PCU project

105 1 0

docling-enhanced

Get your documents ready for gen AI

71 59K 4K

mseep-docling

Get your documents ready for gen AI

71 59K 4K

extended-docling

Get your documents ready for gen AI

66 59K 4K

Search Packages