203 dependents
Package Description Downloads/month
Plumb a PDF for detailed information about each char, rectangle, line, et cetera... 28.1M
A Python client for the Unstructured Platform API 9.5M
A library for performing inference using trained models. 1.1M
A Python library to extract tabular data from PDFs 866K
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be search... 842K
OCR, layout analysis, reading order, table recognition in 90+ languages 799K
Extract structured text from pdfs quickly 401K
docTR (Document Text Recognition) - a seamless, high-performing & accessible lib... 289K
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/J... 282K
Abstra Lib 104K
OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for s... 78K
NeMo Retriever Library is a scalable, performance-oriented document content and ... 74K
Reflective memory for AI agents 65K
Graph execution engine for agentic AI workflows. 58K
Toolkit for linearizing PDFs for LLM datasets/training 41K
Mindee API Helper Library for Python 31K
NeMo Retriever Library is a scalable, performance-oriented document content and ... 31K
30K
Math OCR model that outputs LaTeX and markdown 21K
Python library for converting PDF and images to and from Zebra Programming Langu... 21K
OCR model that converts documents to markdown, HTML, or JSON. 20K
Seti Astro Suite Pro 19K
Lightweight, performant, deep table extraction 13K
Command Line Interface (CLI) for bulk processing/loading data into RegScale 13K
A modern RAG ingestion pipeline from Nvidia 12K
Declarative language for composable Al workflows. Devtool for agents and mere hu... 11K
Detect and extract tables to markdown and csv 10K
"Vibe-Trading: Your Personal Trading Agent" 10K
Data Infrastructure providing a declarative, incremental approach for multimodal... 10K
Implementation of Nougat Neural Optical Understanding for Academic Documents 8K
Builder Bootstrap Platform SDK — pure business logic. CLI, MCP, and API import t... 7K
Fast, multimodal context for agents. 7K
YomiToku is an AI-powered document image analysis package designed specifically ... 7K
Library to extract data from files and documents agnositicaly using LLMs 6K
Extract structured text from pdfs quickly 6K
An Apache-licensed package for extracting, creating, filling, and flattening PDF... 6K
GenAI Processors is a lightweight Python library that enables efficient, paralle... 5K
Easiest way to give context to LLMs; Attachments has the ambition to be the gene... 5K
Graph execution engine for agentic AI workflows. 5K
Structured text extraction framework for digital and scanned PDFs with inline fo... 4K
4K
Extract structural RC models from IFC files into the elemental-engine JSON contr... 4K
SDK for creating ISCCs (International Standard Content Codes) 3K
LLM <-> Python agentic runtime prototype 3K
data-intelligence-mcp-server is a centralized Model Context Protocol (MCP) serve... 3K
Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuni... 3K
Encapsulating Apache Spark for Easy Usage 2K
Markdown grounded to PDF bounding boxes via VLM + Smith-Waterman alignment 2K
Modality-Aware Retrieval Engine inspired by IRPAPERS-style multimodal retrieval 2K
XAIL Data quality 2K