PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Pdf To Markdown Python Packages

Python packages with the GitHub topic pdf-to-markdown. Sorted by relevance, with stars and monthly downloads.
yfedoseev
pdf-oxide

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

92K 717 78
grammy-jiang
research-pipeline

Deterministic stage-based pipeline for searching, screening, downloading, converting, and summarizing academic papers. CLI + MCP server.

34K 1 0
NameetP
pdfmux

PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.

3K 62 6
Hugues-DTANKOUO
olgadoc

Python bindings for Olga. PDF, DOCX, XLSX, HTML → Markdown and typed JSON, 15–40× faster than equivalent-quality OSS. Strictly-typed surface, no Any, one abi3 wheel for CPython 3.8+.

2K 6 0
SakuraMathcraft
mathcraft-ocr

A Windows math workspace for screenshot OCR, handwriting-to-LaTeX, editing, preview, and symbolic computation, powered by MathCraft OCR and MathLive.

2K 155 14
nanonets
docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

2K 1K 129
iamarunbrahma
vision-parse

Parse PDF documents into markdown formatted content using Vision LLMs

2K 469 66
muchdogesec
file2txt

file2txt is a Python library takes common file formats and turns them into plain text (a txt file) with Markdown styling.

1K 12 2
nanonets
llm-data-converter

Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract

1K 7 1
stanford-oval
churro-ocr

CHURRO is an OCR toolkit for historical document transcription, built to make handwritten and printed sources readable at high accuracy and lower cost.

948 38 4
shoryasethia
markdrop

A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered features for image and table analysis. Supports local files and URLs, preserves document structure, extracts high-quality images, detects tables using advanced ML models, and generates detailed content descriptions using multiple LLM providers including OpenAI GPT-4o, Google Gemini, Anthropic Claude, Groq, OpenRouter, and LiteLLM.

941 202 18
wisupai
wisup-e2m

E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M offers an all-in-one, flexible, and open-source solution.

868 1K 72
altaidevorg
llm-food

Serving files for hungry LLMs

690 25 0
herrkaefer
anything2md

Python package and CLI for converting documents to Markdown using Cloudflare Workers AI toMarkdown.

352 1 0
nanonets
document-data-extractor

Convert any document format into LLM-ready data format (markdown) with advanced intelligent document processing capabilities powered by pre-trained models.

269 7 1
TylerMorrison21
paperflow-postprocess

Open-source PDF-to-Markdown post-processor with footnotes, LaTeX normalization, figure links, and YAML metadata. Supports Marker, MinerU, PyMuPDF, and Docling. Includes a self-hosted web UI.

231 19 2
drmingler
smart-llm-loader

A powerful PDF processing toolkit that seamlessly integrates with LLMs for intelligent document chunking and RAG applications. Features smart context-aware segmentation, multi-LLM support, and optimized content extraction for enhanced RAG performance.

132 76 3
markdownbridge
markdownbridge

Python SDK for the MarkdownBridge OCR API — convert documents and images to Markdown

105 0 0
credeed
credeed-pdf-to-markdown

Convert PDF to Markdown using AI, can be used for Agent to understand documents.

80 0 0
iamarunbrahma
multimodal-parser

Parse PDFs into markdown using Vision LLMs

1 465 66
    • Data from PyPI, GitHub, ClickHouse, and BigQuery