203 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Plumb a PDF for detailed information about each char, rectangle, line, et cetera... | 28.1M | |
| A Python client for the Unstructured Platform API | 9.5M | |
| A library for performing inference using trained models. | 1.1M | |
| A Python library to extract tabular data from PDFs | 866K | |
| OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be search... | 842K | |
| OCR, layout analysis, reading order, table recognition in 90+ languages | 799K | |
| Extract structured text from pdfs quickly | 401K | |
| docTR (Document Text Recognition) - a seamless, high-performing & accessible lib... | 289K | |
| Transforms complex documents like PDFs and Office docs into LLM-ready markdown/J... | 282K | |
| Abstra Lib | 104K | |
| OnnxTR a docTR (Document Text Recognition) library Onnx pipeline wrapper - for s... | 78K | |
| NeMo Retriever Library is a scalable, performance-oriented document content and ... | 74K | |
| Reflective memory for AI agents | 65K | |
| Graph execution engine for agentic AI workflows. | 58K | |
| Toolkit for linearizing PDFs for LLM datasets/training | 41K | |
| Mindee API Helper Library for Python | 31K | |
| NeMo Retriever Library is a scalable, performance-oriented document content and ... | 31K | |
| 30K | ||
| Math OCR model that outputs LaTeX and markdown | 21K | |
| Python library for converting PDF and images to and from Zebra Programming Langu... | 21K | |
| OCR model that converts documents to markdown, HTML, or JSON. | 20K | |
| Seti Astro Suite Pro | 19K | |
| Lightweight, performant, deep table extraction | 13K | |
| Command Line Interface (CLI) for bulk processing/loading data into RegScale | 13K | |
| A modern RAG ingestion pipeline from Nvidia | 12K | |
| Declarative language for composable Al workflows. Devtool for agents and mere hu... | 11K | |
| Detect and extract tables to markdown and csv | 10K | |
| "Vibe-Trading: Your Personal Trading Agent" | 10K | |
| Data Infrastructure providing a declarative, incremental approach for multimodal... | 10K | |
| Implementation of Nougat Neural Optical Understanding for Academic Documents | 8K | |
| Builder Bootstrap Platform SDK — pure business logic. CLI, MCP, and API import t... | 7K | |
| Fast, multimodal context for agents. | 7K | |
| YomiToku is an AI-powered document image analysis package designed specifically ... | 7K | |
| Library to extract data from files and documents agnositicaly using LLMs | 6K | |
| Extract structured text from pdfs quickly | 6K | |
| An Apache-licensed package for extracting, creating, filling, and flattening PDF... | 6K | |
| GenAI Processors is a lightweight Python library that enables efficient, paralle... | 5K | |
| Easiest way to give context to LLMs; Attachments has the ambition to be the gene... | 5K | |
| Graph execution engine for agentic AI workflows. | 5K | |
| Structured text extraction framework for digital and scanned PDFs with inline fo... | 4K | |
| 4K | ||
| Extract structural RC models from IFC files into the elemental-engine JSON contr... | 4K | |
| SDK for creating ISCCs (International Standard Content Codes) | 3K | |
| LLM <-> Python agentic runtime prototype | 3K | |
| data-intelligence-mcp-server is a centralized Model Context Protocol (MCP) serve... | 3K | |
| Build, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuni... | 3K | |
| Encapsulating Apache Spark for Easy Usage | 2K | |
| Markdown grounded to PDF bounding boxes via VLM + Smith-Waterman alignment | 2K | |
| Modality-Aware Retrieval Engine inspired by IRPAPERS-style multimodal retrieval | 2K | |
| XAIL Data quality | 2K |