PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Document Parser Python Packages

Python packages with the GitHub topic document-parser. Sorted by relevance, with stars and monthly downloads.
docling-project
docling

Get your documents ready for gen AI

6M 59K 4K
Unstructured-IO
unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.3M 15K 1K
docling-project
docling-slim

Get your documents ready for gen AI

294K 59K 4K
graphlit
graphlit-client

Python client library for Graphlit Platform

32K 20 3
deepdoctection
deepdoctection

A Repo For Document AI

8K 3K 191
Marker-Inc-Korea
autorag

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

8K 5K 398
DanMeon
rhwp-python

PyO3 Python bindings for rhwp — parser and renderer for HWP/HWPX documents (Korean Hancom word processor format)

7K 4 1
Filimoa
openparse

Improved file parsing for LLM’s

4K 3K 140
deepdoctection
dd-core

A Repo For Document AI

3K 3K 191
Unstructured-IO
unstructured-cpu

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

3K 15K 1K
Hugues-DTANKOUO
olgadoc

Python bindings for Olga. PDF, DOCX, XLSX, HTML → Markdown and typed JSON, 15–40× faster than equivalent-quality OSS. Strictly-typed surface, no Any, one abi3 wheel for CPython 3.8+.

2K 6 0
deepdoctection
dd-datasets

A Repo For Document AI

2K 3K 191
nanonets
docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

2K 1K 129
iamarunbrahma
vision-parse

Parse PDF documents into markdown formatted content using Vision LLMs

2K 469 66
stanford-oval
churro-ocr

CHURRO is an OCR toolkit for historical document transcription, built to make handwritten and printed sources readable at high accuracy and lower cost.

948 38 4
decisionfacts
semantic-ai

An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).

370 22 1
Pulkit12dhingra
automated-document-parser

A powerful and automated document parser built with LangChain for intelligent document processing. Automatically detects file types and uses appropriate loaders for PDF, DOCX, CSV, JSON, HTML, and more.

295 1 0
RevanKumarD
llamarker

A universal GenAI-based local parser for complex documents of all types.

273 1 0
DS4SD
docling-google-ocr

Get your documents ready for gen AI

186 59K 4K
anyparser
anyparser-crewai

Anyparser CrewAI Integration

181 2 0
marieai
marie-ai

Python library to Integrate AI-powered features into your applications

156 89 11
decisionfacts
df-extract

DecisionFacts Extraction Library extracts content from PDF, PPTX, Docx, png, jpg., and convert as structured JSON data.

140 14 0
docling-project
docling-enhanced

Get your documents ready for gen AI

79 59K 4K
docling-project
mseep-docling

Get your documents ready for gen AI

76 59K 4K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery