PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

282K 62K 5K
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

77K 62K 5K
Alex8791-cyber
cognithor

Cognithor · Agent OS: Local-first autonomous agent operating system. 19 LLM providers, 18 channels, 145 MCP tools, 6-tier memory, Agent Packs marketplace, zero telemetry. Python 3.12+, Apache 2.0.

11K 121 18
retab-dev
retab

The developper starter pack for document processing

11K 42 2
zoharbabin
dd-agents

Find what gets buried in the data room. Open-source integrated M&A due diligence — 9 specialist domains across every contract, cross-referenced with exact citations.

11K 11 6
Topdu
openocr-python

OpenOCR: An Open-Source Toolkit for General-OCR Research and Applications, integrates a unified training and evaluation benchmark, commercial-grade OCR and Document Parsing systems, and faithful reproductions of the core implementations from a wide range of academic papers.

6K 1K 125
opendatalab
mineru-selfhosted-mcp

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

3K 62K 5K
yuvaraj3855
preocr

Fast document classification and OCR detection. Analyzes any file type to determine if OCR is needed, saving time and money on unnecessary processing.

3K 10 4
tiroq
mdify-cli

MDify is a document-to-Markdown conversion library for extracting structured content from complex PDFs and document images, including tables, charts, and scanned documents.

3K 0 0
ispras
dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

2K 661 52
LATIS-DocumentAI-Group
documentai-std

DocumentAI-std is a Python library designed to facilitate and standardize document analysis and processing tasks. It offers functionality for handling document elements, performing optical character recognition (OCR), and managing document datasets.

2K 3 0
AdemBoukhris457
doctra

Parse, extract, and analyze documents with ease

1K 204 33
UiForm
uiform

UiForm official python library

1K 42 2
xyntopia
pydoxtools

This library contains a set of tools in order to extract and synthesize structured information from documents

541 87 14
nanonets
docext

An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

431 2K 143
lazyFrogLOL
llmdocparser

Using LLM to parse PDF and get better chunk for retrieval

428 269 9
Retab-dev
k-llms

The developper starter pack for document processing

369 42 2
ahmetkumass
contract-analyzer

Open-source tool for extracting and analyzing key information from legal contracts and documents with ease.

304 12 1
acsenrafilho
cucaracha

A bureaucratic cockroach (cucaracha) assistent to help in document processing and analysis

259 1 1
ZeroBone
officialeye

An advanced AI-powered generic document-analysis tool

242 7 3
olaflaitinen
thulium-htr

Thulium is a production-ready Python library for offline handwritten text recognition (HTR) supporting 52+ languages across Latin, Cyrillic, Greek, Arabic, Hebrew, Devanagari, Chinese, Japanese, Korean, and Georgian scripts.

204 8 0
jiahuidegit
doc-mcp-server

让AI读懂任何复杂文档 - 解决AI上下文限制问题的通用MCP服务器 | Universal MCP server for AI to understand complex documents

189 2 0
opendatalab
xh-pdf-parser

A practical tool for converting PDF to Markdown

167 62K 5K
FitLayout
flclient

Python client library for the FitLayout REST API

131 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery