18 dependents
Package Description Downloads/month
Structured text extraction framework for digital and scanned PDFs with inline fo... 4K
AgentSociety 2 is a modern, LLM-native agent simulation platform designed for so... 3K
OmniDocs📄 - One stop visual document processing framework 537
A tool for parsing PDF document layouts and chunking content 504
A Python library for extracting and analyzing content from any documents, suppor... 491
Docket Analyzer OCR Utility 462
A simple and efficient RAG (Retrieval-Augmented Generation) library with Knowled... 448
This package enables Retrieval-Augmented Generation (RAG) for PDF documents, enh... 330
A powerful tool to extract text, tables, charts, and formulas from documents and... 329
Crop image/table/code regions from PDF files and export metadata 321
A tool for parsing PDF document layouts and chunking content. 264
OCR tool for botanical documents using layout analysis and LLMs/OCR engines. 196
Using GPT to parse PDF files and generate LaTeX code. 188
A Comprehensive Toolkit for High-Quality PDF Content Extraction. 176
A practical tool for converting PDF to Markdown 167
DocRag: An advanced document search and retrieval system leveraging Retrieval-Au... 153
An AI companion for reading papers. 132
116