PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
yfedoseev
pdf-oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

81K 717 78
grammy-jiang
research-pipeline

Deterministic stage-based pipeline for searching, screening, downloading, converting, and summarizing academic papers. CLI + MCP server.

33K 1 0
NameetP
pdfmux

PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.

4K 62 6
nanonets
docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

2K 1K 129
SakuraMathcraft
mathcraft-ocr

A Windows math workspace for screenshot OCR, handwriting-to-LaTeX, editing, preview, and symbolic computation, powered by MathCraft OCR and MathLive.

2K 155 14
Hugues-DTANKOUO
olgadoc

Python bindings for Olga. PDF, DOCX, XLSX, HTML → Markdown and typed JSON, 15–40× faster than equivalent-quality OSS. Strictly-typed surface, no Any, one abi3 wheel for CPython 3.8+.

2K 6 0
iamarunbrahma
vision-parse

Parse PDF documents into markdown formatted content using Vision LLMs

2K 469 66
muchdogesec
file2txt

file2txt is a Python library takes common file formats and turns them into plain text (a txt file) with Markdown styling.

1K 12 2
stanford-oval
churro-ocr

CHURRO is an OCR toolkit for historical document transcription, built to make handwritten and printed sources readable at high accuracy and lower cost.

1K 38 4
shoryasethia
markdrop

A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered features for image and table analysis. Supports local files and URLs, preserves document structure, extracts high-quality images, detects tables using advanced ML models, and generates detailed content descriptions using multiple LLM providers including OpenAI GPT-4o, Google Gemini, Anthropic Claude, Groq, OpenRouter, and LiteLLM.

974 202 18
nanonets
llm-data-converter

Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract

900 7 1
wisupai
wisup-e2m

E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M offers an all-in-one, flexible, and open-source solution.

839 1K 72
altaidevorg
llm-food

Serving files for hungry LLMs

583 25 0
herrkaefer
anything2md

Python package and CLI for converting documents to Markdown using Cloudflare Workers AI toMarkdown.

321 1 0
nanonets
document-data-extractor

Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.

254 7 1
TylerMorrison21
paperflow-postprocess

Open-source PDF-to-Markdown post-processor with footnotes, LaTeX normalization, figure links, and YAML metadata. Supports Marker, MinerU, PyMuPDF, and Docling. Includes a self-hosted web UI.

215 19 2
drmingler
smart-llm-loader

A powerful PDF processing toolkit that seamlessly integrates with LLMs for intelligent document chunking and RAG applications. Features smart context-aware segmentation, multi-LLM support, and optimized content extraction for enhanced RAG performance.

105 76 3
markdownbridge
markdownbridge

Python SDK for the MarkdownBridge OCR API — convert documents and images to Markdown

93 0 0
credeed
credeed-pdf-to-markdown

Convert PDF to Markdown using AI, can be used for Agent to understand documents.

73 0 0
iamarunbrahma
multimodal-parser

Parse PDFs into markdown using Vision LLMs

1 465 66
    • Data from PyPI, GitHub, ClickHouse, and BigQuery