PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
nolze
msoffcrypto-tool

Python tool and library for decrypting and encrypting MS Office files using passwords or other keys

8.7M 616 91
docling-project
docling

Get your documents ready for gen AI

6M 59K 4K
Unstructured-IO
unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.2M 15K 1K
pqzx
htmldocx

Convert html to docx

895K 87 59
dfop02
html-for-docx

Convert html to docx

286K 61 15
opendatalab
mineru

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

282K 62K 5K
docling-project
docling-slim

Get your documents ready for gen AI

206K 59K 4K
opendatalab
magic-pdf

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

77K 62K 5K
yobix-ai
extractous

Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.

63K 2K 96
shloktech
md2docx-python

Simple and straight forward Python utility that converts a Microsoft Word document (`.docx`) to a Markdown file (`.md`) and vice versa. It supports multiple Markdown elements, including headings, bold and italic text, both unordered and ordered lists, and many more.

15K 47 5
Luizhcrs
template-engine-ia

Document normalization engine — learn a template from examples and convert any document automatically via LLM

10K 1 0
shcherbak-ai
contextgem

ContextGem: Effortless LLM extraction from documents

9K 2K 155
BramAlkema
openxml-audit

Validate Office files in pure Python with Open XML SDK parity, pytest fixtures, and CI hooks.

7K 1 0
PlateerLab
document-adapter

LLM이 DOCX/PPTX/HWPX 문서를 직접 편집할 수 있게 해주는 통합 어댑터 + MCP 서버. Claude Desktop / Claude Code / Anthropic API Tool Use 호환. pip install document-adapter

6K 0 0
badbye
docxpy

A pure python based utility to extract text and images from docx files.

6K 5 4
sunholo-data
ailang-parse

Universal document parsing and generation in AILANG. Deterministic Office (DOCX/PPTX/XLSX) extraction, AI-powered PDF/image parsing, 9-format document generation.

6K 0 0
ykarapazar
word-mcp-live

The only MCP server that edits Word documents while they're open — 114 tools, live editing, tracked changes, per-action undo

6K 64 17
explosion
spacy-layout

📚 Process PDFs, Word documents and more with spaCy

5K 894 64
u9401066
asset-aware-mcp

Asset-Aware MCP Server — AI Agent precisely accesses tables, figures, sections from PDFs + .docx round-trip editing (DFM) with 46 tools / 13 resources, segmentation export, layout overlay, OCR preprocessing, knowledge graph (LightRAG)

5K 0 0
farfarfun
funread

文档阅读和解析工具包 - 支持多种文档格式的读取和解析

4K 1 0
rocklambros
any2md

Convert PDF, DOCX, HTML, and TXT files — or web pages by URL — to clean, LLM-optimized Markdown with YAML frontmatter.

4K 15 2
henrihapponen
docxedit

Edit Word (.docx) documents effortlessly without changing the original formatting.

4K 23 3
opendatalab
mineru-selfhosted-mcp

Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.

3K 62K 5K
turulomio
unogenerator

Libreoffice files generator programmatically with python and Libreoffice server instances

3K 15 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery