PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
docling-project
docling

Get your documents ready for gen AI

6M 59K 4K
Unstructured-IO
unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.2M 15K 1K
docling-project
docling-slim

Get your documents ready for gen AI

206K 59K 4K
yfedoseev
pdf-oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

81K 717 78
Unstructured-IO
unstructured-cpu

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

3K 15K 1K
Hugues-DTANKOUO
olgadoc

Python bindings for Olga. PDF, DOCX, XLSX, HTML → Markdown and typed JSON, 15–40× faster than equivalent-quality OSS. Strictly-typed surface, no Any, one abi3 wheel for CPython 3.8+.

2K 6 0
stanford-oval
churro-ocr

CHURRO is an OCR toolkit for historical document transcription, built to make handwritten and printed sources readable at high accuracy and lower cost.

1K 38 4
shoryasethia
markdrop

A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered features for image and table analysis. Supports local files and URLs, preserves document structure, extracts high-quality images, detects tables using advanced ML models, and generates detailed content descriptions using multiple LLM providers including OpenAI GPT-4o, Google Gemini, Anthropic Claude, Groq, OpenRouter, and LiteLLM.

974 202 18
baughmann
tikara

The metadata and text content extractor for almost every file type.

537 9 0
asiff00
bangla-pdf-ocr

Bangla PDF to text converter that works on Windows, macOS, and Linux without any extra downloads or configurations.

227 21 3
DS4SD
docling-google-ocr

Get your documents ready for gen AI

179 59K 4K
zevio
pcu-io

IO management for PCU project

106 0 0
zevio
pcu-pdf

PDF parser component (Apache Tika) for PCU project

105 1 0
docling-project
docling-enhanced

Get your documents ready for gen AI

71 59K 4K
docling-project
mseep-docling

Get your documents ready for gen AI

71 59K 4K
DS4SD
extended-docling

Get your documents ready for gen AI

66 59K 4K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery