Document Ai Python Packages

deepdoctection

A Repo For Document AI

8K 3K 191

exstruct

Conversion from Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines, and autonomous Excel reading/writing by AI agents via CLI and MCP integration.

6K 141 22

nanoindex

Agentic RAG Harness for long documents, Tree and Graph based reasoning. Cited answers down to the pixel

5K 49 5

dd-core

A Repo For Document AI

3K 3K 191

mdify-cli

MDify is a document-to-Markdown conversion library for extracting structured content from complex PDFs and document images, including tables, charts, and scanned documents.

3K 0 0

dd-datasets

A Repo For Document AI

2K 3K 191

donut-python

Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022

1K 7K 560

flash-mineru

Fast Inference Architecture for MinerU

793 49 7

docvision

Production-ready document parsing with Vision Language Models

489 1 0

grepctl

BigQuery Semantic Search Orchestrator

464 4 0

german-ocr

High-performance German document OCR - Local & Cloud with GPU/CPU support

439 94 6

optical-context-mcp

MCP server that compresses OCR-heavy PDFs into dense packed images so AI agents can handle long visual documents

248 1 0

gdoczai

GDocz by Gramosoft is an open-source Intelligent Document Processing platform that turns raw PDFs and images into clean, structured JSON — powered by multi-engine OCR and AI-driven schema extraction.

207 6 1

iflow-mcp-harumiweb-exstruct

Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines

86 141 22

doc-vision-parser

Python library for intelligent document parsing using Vision Language Models. Extract structured text and markdown from PDFs and images with self-correcting AI workflows. Supports OpenAI-compatible APIs.

5 1 0

Search Packages