PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
opendataloader-project
opendataloader-pdf

PDF Parser for AI-ready data. Automate PDF accessibility. Open-source.

105K 20K 2K
run-llama
liteparse

A fast, helpful, and open-source document parser

30K 5K 326
reactor-no8
neots

NeoTextSynthesizer is a high-performance OCR training data generator.

21K 1 0
rtr46
meikiocr

high-speed, high-accuracy, local ocr for japanese video games

6K 75 3
StabRise
scaledp

ScaleDP is an Open-Source extension of Apache Spark for Document Processing

5K 18 1
opendataloader-project
langchain-opendataloader-pdf

A LangChain integration for OpenDataLoader PDF

3K 32 3
bropines
chrome-lens-py

Library to use Google Lens OCR for free, via API used in Chromium on python.

3K 62 8
LATIS-DocumentAI-Group
documentai-std

DocumentAI-std is a Python library designed to facilitate and standardize document analysis and processing tasks. It offers functionality for handling document elements, performing optical character recognition (OCR), and managing document datasets.

2K 3 0
gnana70
ocr-tamil

Python Tamil OCR package

1K 86 15
clovaai
synthtiger

Official Implementation of SynthTIGER (Synthetic Text Image Generator), ICDAR 2021

1K 575 109
emedvedev
aocr

Optical character recognition model for Tensorflow based on Visual Attention.

644 1K 251
Anish-M-code
pdftotext3

A simple pdftotext conversion tool for Windows 8.1/10/11 and FEDORA/UBUNTU/DEBIAN/ARCH based linux distros using poppler-utils and Google's tesseract-ocr.

342 22 2
StabRise
pyspark-pdf

PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it

291 81 4
Joshuaatanu
ocr-genai-beta

A package for performing OCR and interpreting the output using OpenAI and Gemini models.

257 4 0
nuhmanpk
pyplatex

Simple , Scalable and Ready to use ANPR package for Automatic Number Plate Recognition

248 6 0
microsoft
genalog

Tools for generating analog document (images) from raw text

223 355 34
VerisimilitudeX
ocr-pdf2txt

Use Optical Character Recognition technology to convert scanned PDFs into TXT files locally.

210 1 0
M4cs
twohundrediq

HQ Trivia Bot for Windows Using LonelyScreen

160 4 1
ianzhao05
textshot

Python tool for grabbing text via screenshot

157 2K 256
digidigital
ocrtestdata

Generate large amounts of image-based PDF test data for file-based OCR and Document Management Solutions.

125 0 0
tjkessler
tesseract-positional

Tool to save positional OCR data to a text file

122 0 0
snakers4
silero-ocr

Simple optical character recognition (OCR) by Silero

99 0 0
hanifabd
pisahkan-ktp

Python package for detecting information from a ktp / indonesian id card

89 10 3
    • Data from PyPI, GitHub, ClickHouse, and BigQuery