PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

282K 62K 5K
Layout-Parser
layoutparser

A Unified Toolkit for Deep Learning Based Document Image Analysis

155K 6K 533
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

77K 62K 5K
breezedeus
pix2text

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

9K 3K 270
u9401066
asset-aware-mcp

Asset-Aware MCP Server — AI Agent precisely accesses tables, figures, sections from PDFs + .docx round-trip editing (DFM) with 46 tools / 13 resources, segmentation export, layout overlay, OCR preprocessing, knowledge graph (LightRAG)

5K 0 0
RapidAI
rapid-layout

Analysis of Chinese and English layouts 中英文版面分析

4K 268 21
opendatalab
mineru-selfhosted-mcp

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

3K 62K 5K
yuvaraj3855
preocr

Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.

3K 10 4
yoshihikoueno
pdf-layout-scanner

A more complete example of programming with PDFMiner, which continues where the default documentation stops

1K 7 3
mindspore-lab
mindocr

A toolbox of OCR models and algorithms based on MindSpore.

544 300 62
Magnet-AI
quanta-pdf

Advanced PDF layout analysis engine for extracting figures, tables, and structured content from complex engineering documents using computer vision and machine learning.

255 2 1
opendatalab
xh-pdf-parser

A practical tool for converting PDF to Markdown

167 62K 5K
MBAigner
pdfsegmenter

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

117 23 3
sulzbals
ocrd-gbn

Collection of OCR-D compliant tools for layout analysis and segmentation of historical german-language documents published in Brazil

98 11 0
ixalodecte
filestruct

A python package to structure files using visual and style informations

87 1 0
mindspore-lab
opensourcedot-mindocr

A toolbox of ocr models and algorithms based on MindSpore

72 300 62
chigwell
text2design

A new package that enables users to input textual descriptions of visual design, layout, or interface concepts and returns structured representations or annotations derived from the description. It le

61 1 0
opendatalab
lazyllm-magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

60 62K 5K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery