PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
docling-project
docling

Get your documents ready for gen AI

6M 59K 4K
Unstructured-IO
unstructured

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

5.2M 15K 1K
docling-project
docling-slim

Get your documents ready for gen AI

206K 59K 4K
graphlit
graphlit-client

Python client library for Graphlit Platform

32K 20 3
Marker-Inc-Korea
autorag

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

9K 5K 398
deepdoctection
deepdoctection

A Repo For Document AI

8K 3K 191
DanMeon
rhwp-python

PyO3 Python bindings for rhwp — parser and renderer for HWP/HWPX documents (Korean Hancom word processor format)

5K 4 1
Filimoa
openparse

Improved file parsing for LLM’s

4K 3K 140
deepdoctection
dd-core

A Repo For Document AI

3K 3K 191
Unstructured-IO
unstructured-cpu

Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

3K 15K 1K
nanonets
docstrange

Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intelligent structured data extraction and advanced OCR.

2K 1K 129
deepdoctection
dd-datasets

A Repo For Document AI

2K 3K 191
Hugues-DTANKOUO
olgadoc

Python bindings for Olga. PDF, DOCX, XLSX, HTML → Markdown and typed JSON, 15–40× faster than equivalent-quality OSS. Strictly-typed surface, no Any, one abi3 wheel for CPython 3.8+.

2K 6 0
iamarunbrahma
vision-parse

Parse PDF documents into markdown formatted content using Vision LLMs

2K 469 66
stanford-oval
churro-ocr

CHURRO is an OCR toolkit for historical document transcription, built to make handwritten and printed sources readable at high accuracy and lower cost.

1K 38 4
decisionfacts
semantic-ai

An open source framework for Retrieval-Augmented System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).

343 22 1
Pulkit12dhingra
automated-document-parser

A powerful and automated document parser built with LangChain for intelligent document processing. Automatically detects file types and uses appropriate loaders for PDF, DOCX, CSV, JSON, HTML, and more.

290 1 0
RevanKumarD
llamarker

A universal GenAI-based local parser for complex documents of all types.

200 1 0
DS4SD
docling-google-ocr

Get your documents ready for gen AI

179 59K 4K
anyparser
anyparser-crewai

Anyparser CrewAI Integration

154 2 0
marieai
marie-ai

Python library to Integrate AI-powered features into your applications

136 89 11
decisionfacts
df-extract

DecisionFacts Extraction Library extracts content from PDF, PPTX, Docx, png, jpg., and convert as structured JSON data.

132 14 0
docling-project
docling-enhanced

Get your documents ready for gen AI

71 59K 4K
docling-project
mseep-docling

Get your documents ready for gen AI

71 59K 4K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery