PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
felixdittrich92
docling-ocr-onnxtr

OnnxTR OCR plugin for Docling

26K 19 0
NameetP
pdfmux

PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.

4K 62 6
tiroq
mdify-cli

MDify is a document-to-Markdown conversion library for extracting structured content from complex PDFs and document images, including tables, charts, and scanned documents.

3K 0 0
docling-project
docling-graph

Transform unstructured documents into validated, rich and queryable knowledge graphs.

3K 139 21
versionHQ
versionhq

Autonomous agent networks for task automation with multi-step reasoning.

2K 30 10
ENDEVSOLS
longparser

Privacy-first document intelligence engine — converts PDFs, DOCX, PPTX, XLSX, and CSV into AI-ready Markdown + structured JSON for RAG pipelines.

2K 15 1
jspast
cells2table

Table image parsing with cell detection models

1K 0 0
shoryasethia
markdrop

A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered features for image and table analysis. Supports local files and URLs, preserves document structure, extracts high-quality images, detects tables using advanced ML models, and generates detailed content descriptions using multiple LLM providers including OpenAI GPT-4o, Google Gemini, Anthropic Claude, Groq, OpenRouter, and LiteLLM.

974 202 18
stevereiner
flexible-graphrag

Flexible GraphRAG system supporting multiple LLM providers, graph databases, vector stores, and data sources

906 123 27
DCC-BS
docling-glm-ocr

A docling OCR plugin for GLM-OCR

795 9 0
ghodsizadeh
pdf2csv

A python library and CLI tool to convert PDF files to CSV files.

642 42 5
aspose-cells-foss
aspose-cells-foss

High-performance Python Excel processing library with advanced conversion capabilities

519 9 0
aksarav
pdfstract

PDFStract - Extract, Chunking and Embedding Layer in Your RAG Pipeline - Available as CLI - WEBUI - API

504 146 12
DCC-BS
docling-pp-doc-layout

A Docling plugin for PaddlePaddle PP-DocLayout-V3 model document layout detection.

371 4 0
Sinapsis-AI
sinapsis-docling

Package to perform document conversion using Docling

260 0 0
aspose-cells-foss
aspose-cells-foss-for-python

High-performance Python Excel processing library with advanced conversion capabilities

251 9 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery