336 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Plumb a PDF for detailed information about each char, rectangle, line, et cetera... | 27.6M | |
| An Agent-Computer Interface (ACI) designed for software development agents OpenH... | 1.4M | |
| A pure Python cleanroom implementation of libmagic, with instrumented parsing fr... | 1.2M | |
| A Python library to extract tabular data from PDFs | 851K | |
| OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be search... | 835K | |
| Collection of open-source libraries and tools for Robotic Process Automation (RP... | 471K | |
| extract text from any document. no muss. no fuss. | 381K | |
| Transforms complex documents like PDFs and Office docs into LLM-ready markdown/J... | 282K | |
| OpenViking is an open-source context database designed specifically for AI Agent... | 211K | |
| TypeCode provides comprehensive filetype and mimetype detection using multiple d... | 81K | |
| Transforms complex documents like PDFs and Office docs into LLM-ready markdown/J... | 77K | |
| :mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ...... | 74K | |
| Maintained fork of deanmalmgren/textract to replace '*' dependencies and other u... | 53K | |
| A free, open-source expert system for guided interviews and document assembly, b... | 46K | |
| AutoCoder: AutoCoder | 37K | |
| Python tool for converting files and office documents to Markdown. | 29K | |
| A free, open-source expert system for guided interviews and document assembly, b... | 21K | |
| Pythonic class-based interface to LLMs | 17K | |
| [EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 A... | 17K | |
| RESTAI, so many 'A's and 'I's, so little time... | 17K | |
| a utility to extract the title from a PDF file | 14K | |
| Aient: The Awakening of Agent. | 14K | |
| Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfint... | 14K | |
| Converting PDF files to text, mainly with a focus on arXiv papers. | 13K | |
| A pure Python cleanroom implementation of libmagic, with instrumented parsing fr... | 13K | |
| A Python library to extract tabular data from PDFs | 12K | |
| AI Execute Services - A middleware framework for AI-powered task execution and t... | 9K | |
| modelmerge is a multi-large language model API aggregator. | 9K | |
| MAS | 9K | |
| Extracts and formats text annotations from a PDF file | 8K | |
| A web scraping library based on LangChain which uses LLM and direct graph logic ... | 8K | |
| A simple resume parser used for extracting information from resumes | 7K | |
| CredSweeper is a tool to detect credentials in any directories or files. CredSwe... | 6K | |
| A flow language for botrun | 6K | |
| PredaCore — the apex autonomous agent. Hybrid Rust memory kernel, topped LongMem... | 6K | |
| Ad removal tool for PDFs. | 6K | |
| Builder Bootstrap Platform SDK — pure business logic. CLI, MCP, and API import t... | 6K | |
| Harness that gives frontier models full system access — shell, filesystem, brows... | 6K | |
| A library and command line tool for extracting indicators of compromise (IOCs) f... | 5K | |
| A command line application for downloading scientific papers. | 5K | |
| foundry-mcp | 5K | |
| InPhO Topic Explorer | 5K | |
| :mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ...... | 5K | |
| Improved file parsing for LLM’s | 4K | |
| Enhances Review Board with PDF review and diffing, reports and analytics, new so... | 4K | |
| The file management automation tool. | 4K | |
| A better pdf to text extraction toolkit | 4K | |
| LLM-assisted biomedical literature screening and structured extraction. Supports... | 4K | |
| Framework for prototyping of LLM-based applications | 4K | |
| LLM <-> Python agentic runtime prototype | 3K |