PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
h2non
jsonpath-ng

Finally, a JSONPath implementation for Python that aims to be standard compliant. That's all. Enjoy!

77.3M 730 112
docling-project
docling

Get your documents ready for gen AI

6M 59K 4K
deeplook
svglib

Read SVG files and convert them to other formats.

4.5M 362 85
docling-project
docling-slim

Get your documents ready for gen AI

206K 59K 4K
topk-io
topk-sdk

Provide the right context to your agents.

75K 70 3
signnow
signnow-python-sdk

Official SignNow SDK for Python. Sign documents, request eSignatures, and build role-based multi-signer workflows via REST API.

72K 12 7
konstantint
passporteye

Extraction of machine-readable zone information from passports, visas and id-cards via OCR

13K 446 122
karolzak
boxdetect

BoxDetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like character or checkbox boxes on scanned forms.

12K 113 21
Manu11-Pro
organisingfiles-by-type

This repo features File Organising by Type of Files!.This repo uses python to Organise Files so that users can care about doing stuff they want to instead of the tedious new_folder,copy,cut,paste.It is also a good way to not loose your files in the messy file heapes!

3K 1 0
ispras
dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

2K 661 52
seanpedrick-case
doc-redaction

Redact PDF/image-based documents, Word, or CSV/XLSX files using a graphical user interface. Demo: https://huggingface.co/spaces/seanpedrickcase/document_redaction or with try with VLMs: https://huggingface.co/spaces/seanpedrickcase/document_redaction_vlm

2K 50 10
Hugues-DTANKOUO
olgadoc

Python bindings for Olga. PDF, DOCX, XLSX, HTML → Markdown and typed JSON, 15–40× faster than equivalent-quality OSS. Strictly-typed surface, no Any, one abi3 wheel for CPython 3.8+.

2K 6 0
phil65
docler

Abstractions & Tools for OCR / document processing

2K 5 2
openlegaldata
oldp

Open Legal Data Platform

1K 137 24
stanford-oval
churro-ocr

CHURRO is an OCR toolkit for historical document transcription, built to make handwritten and printed sources readable at high accuracy and lower cost.

1K 38 4
Aquilesorei
strutex

strutex is a Python library designed to extract JSON from documents .

826 10 0
usercando
pullcite

Evidence-backed structured extraction. Pull data from documents with proof of where each value came from.

705 1 0
tboy1337
pr2md

PR2MD is a powerful command-line tool that extracts GitHub Pull Request and Issue data and converts it into comprehensive, well-formatted Markdown documents. Perfect for documentation, archiving, code reviews, or offline analysis of pull requests.

703 1 0
mouraworks
docowling

SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for powering downstream workflows such as gen AI applications.

646 3 1
cyrildever
redacted-py

Redacting classified documents

545 3 0
nometria
medical-ocr

Multi-engine OCR pipeline for medical and legal documents

419 1 0
docling-project
docling-sdg

A set of tools to create synthetically-generated data from documents

417 45 17
anthonybench
sleepyconvert

Converts data files, images and documents to different formats

345 0 0
bradmontgomery
word2html

A quick and dirty script to convert a Word (docx) document to html.

328 54 12
    • Data from PyPI, GitHub, ClickHouse, and BigQuery