336 dependents
Package Description Downloads/month
Plumb a PDF for detailed information about each char, rectangle, line, et cetera... 27.6M
An Agent-Computer Interface (ACI) designed for software development agents OpenH... 1.4M
A pure Python cleanroom implementation of libmagic, with instrumented parsing fr... 1.2M
A Python library to extract tabular data from PDFs 851K
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be search... 835K
Collection of open-source libraries and tools for Robotic Process Automation (RP... 471K
extract text from any document. no muss. no fuss. 381K
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/J... 282K
OpenViking is an open-source context database designed specifically for AI Agent... 211K
TypeCode provides comprehensive filetype and mimetype detection using multiple d... 81K
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/J... 77K
:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ...... 74K
Maintained fork of deanmalmgren/textract to replace '*' dependencies and other u... 53K
A free, open-source expert system for guided interviews and document assembly, b... 46K
AutoCoder: AutoCoder 37K
Python tool for converting files and office documents to Markdown. 29K
A free, open-source expert system for guided interviews and document assembly, b... 21K
Pythonic class-based interface to LLMs 17K
[EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 A... 17K
RESTAI, so many 'A's and 'I's, so little time... 17K
a utility to extract the title from a PDF file 14K
Aient: The Awakening of Agent. 14K
Parser for Consolidated Account Statements (CAS) generated from CAMS/Karvy/Kfint... 14K
Converting PDF files to text, mainly with a focus on arXiv papers. 13K
A pure Python cleanroom implementation of libmagic, with instrumented parsing fr... 13K
A Python library to extract tabular data from PDFs 12K
AI Execute Services - A middleware framework for AI-powered task execution and t... 9K
modelmerge is a multi-large language model API aggregator. 9K
MAS 9K
Extracts and formats text annotations from a PDF file 8K
A web scraping library based on LangChain which uses LLM and direct graph logic ... 8K
A simple resume parser used for extracting information from resumes 7K
CredSweeper is a tool to detect credentials in any directories or files. CredSwe... 6K
A flow language for botrun 6K
PredaCore — the apex autonomous agent. Hybrid Rust memory kernel, topped LongMem... 6K
Ad removal tool for PDFs. 6K
Builder Bootstrap Platform SDK — pure business logic. CLI, MCP, and API import t... 6K
Harness that gives frontier models full system access — shell, filesystem, brows... 6K
A library and command line tool for extracting indicators of compromise (IOCs) f... 5K
A command line application for downloading scientific papers. 5K
foundry-mcp 5K
InPhO Topic Explorer 5K
:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ...... 5K
Improved file parsing for LLM’s 4K
Enhances Review Board with PDF review and diffing, reports and analytics, new so... 4K
The file management automation tool. 4K
A better pdf to text extraction toolkit 4K
LLM-assisted biomedical literature screening and structured extraction. Supports... 4K
Framework for prototyping of LLM-based applications 4K
LLM <-> Python agentic runtime prototype 3K