97 dependents
Package Description Downloads/month
Academia MCP server: Tools for automatic scientific research 12K
8K
A Python library for processing receipts, extracting key information, and assist... 8K
7K
4K
Structured text extraction framework for digital and scanned PDFs with inline fo... 4K
Modular media quality metrics toolkit. 3K
Agent S: an open agentic framework that uses computers like a human 3K
3K
Query public data sources worldwide through a unified CLI and REST API 3K
DocumentAI-std is a Python library designed to facilitate and standardize docume... 2K
2K
Fast PaddleOCR MCP server - Extract text from images using PaddleOCR with optimi... 2K
Ingest sources with proper citation — PDF, URL, media, Office, DJVU 1K
基于自然语言的,跨端跨框架 BDD UI 自动化测试方案,BDD testing, Python style, Present by Trip Flight 1K
1K
A library for electronic Know Your Customer (eKYC) verification 1K
Video Archive AI analysis tool 1K
Huggingface bolts for geniusrise 1K
Collection of Taiwan Rental House Data from Public Website 1K
Parse, extract, and analyze documents with ease 1K
阻止群成员发送广告内容,过滤内容可配置 1K
963
Extract text and information from pdf files 929
An AI assistant powered by Llama models 913
一款适用于QQ群聊天的语录库插件 900
Convert the model in PaddleOCR to ONNX format 819
FlexiData is an open-source Python package designed for processing unstructured ... 720
718
Plugins to enable usage of PaddleOCR in ocr_translate 716
Exploit computer vision technology with Orange Data Mining ! 642
🔥地址解析识别python版本 607
Actscene OCR: 日本語書類向けの包括的OCRパイプライン (PaddleOCRベース) 601
airclick 相关python包 569
A fast automatic number-plate recognition (ANPR) library 562
跨平台的UI自动化框架,适用于混合型app 554
OCR, Archive, Index and Search: Implementation agnostic OCR framework. 509
Privision 是一款强大的视频内容脱敏工具,采用先进的 OCR 技术自动识别并打码视频中的敏感信息。支持手机号、身份证号、自定义关键字等多种检测模式,提供... 469
Comic-Focused Hybrid OCR Library, made in python 436
PaddleOCR engine plugin for OCRmyPDF 432
Using LLM to parse PDF and get better chunk for retrieval 428
A tool to classify images 414
A modular QSR Order Verification Python Package 381
Extracts citations from PDF, URLs and local media files in CSL-JSON. 369
Use json5 for view-based workflows configuration 340
A powerful tool to extract text, tables, charts, and formulas from documents and... 329
A robust MRZ extraction and validation engine library designed for real-world K... 328
rasa_contrib is a addon package for rasa. It provide some useful/powerful additi... 317
Deterministyczny generator identyfikatorów dokumentów z OCR 313
Local-first Python RAG pipeline with sentence-transformer embeddings, FAISS/BM25... 307