PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About

Layout Analysis Python Packages

Python packages with the GitHub topic layout-analysis. Sorted by relevance, with stars and monthly downloads.
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

289K 63K 5K
Layout-Parser
layoutparser

A Unified Toolkit for Deep Learning Based Document Image Analysis

162K 6K 533
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

78K 63K 5K
u9401066
asset-aware-mcp

Asset-Aware MCP Server — AI Agent precisely accesses tables, figures, sections from PDFs + .docx round-trip editing (DFM) with 46 tools / 13 resources, segmentation export, layout overlay, OCR preprocessing, knowledge graph (LightRAG)

9K 0 1
breezedeus
pix2text

An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.

9K 3K 270
opendatalab
mineru-selfhosted-mcp

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

5K 63K 5K
RapidAI
rapid-layout

Analysis of Chinese and English layouts 中英文版面分析

4K 269 21
yuvaraj3855
preocr

A fast, layout-aware OCR decision engine for document processing pipelines. Detects whether files truly require OCR before expensive processing, reducing unnecessary OCR calls while preserving extraction reliability.

3K 10 4
yoshihikoueno
pdf-layout-scanner

A more complete example of programming with PDFMiner, which continues where the default documentation stops

1K 7 3
mindspore-lab
mindocr

A toolbox of ocr models and algorithms based on MindSpore

600 301 62
Kubenew
pdf2struct

`pdf2struct` extracts structured JSON from PDF documents.

363 1 0
Magnet-AI
quanta-pdf

Advanced PDF layout analysis engine for extracting figures, tables, and structured content from complex engineering documents using computer vision and machine learning.

333 2 1
opendatalab
xh-pdf-parser

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

180 63K 5K
MBAigner
pdfsegmenter

This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified and returned. Tables are retrieved formatted as a CSV.

156 23 3
sulzbals
ocrd-gbn

OCR-D compliant toolset for optical layout recognition on historical german-language documents published in Brazil

126 11 0
mindspore-lab
opensourcedot-mindocr

A toolbox of OCR models and algorithms based on MindSpore.

99 301 62
ixalodecte
filestruct

A python package to structure files using visual and style informations

99 1 0
opendatalab
lazyllm-magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

96 63K 5K
chigwell
text2design

A new package that enables users to input textual descriptions of visual design, layout, or interface concepts and returns structured representations or annotations derived from the description. It le

88 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery