A Python asyncio wrapper for Tesseract-OCR.
Fast and memory-efficient Python PDF Parser based on xpdf sources
Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features
A simple pdftotext conversion tool for Windows 8.1/10/11 and FEDORA/UBUNTU/DEBIAN/ARCH based linux distros using poppler-utils and Google's tesseract-ocr.
Converts An Image to a CSV. This exists because Chorus 3.0 are bat-shit and only show images for vital metadata.