102 dependents
Package Description Downloads/month
Academia MCP server: Tools for automatic scientific research 12K
8K
A Python library for processing receipts, extracting key information, and assist... 8K
7K
4K
PPOCRLabelv3 is a semi-automatic graphic annotation tool suitable for OCR field,... 4K
Modular media quality metrics toolkit. 3K
Agent S: an open agentic framework that uses computers like a human 3K
Query public data sources worldwide through a unified CLI and REST API 3K
DocumentAI-std is a Python library designed to facilitate and standardize docume... 2K
2K
Fast PaddleOCR MCP server - Extract text from images using PaddleOCR with optimi... 2K
Ingest sources with proper citation — PDF, URL, media, Office, DJVU 1K
基于自然语言的,跨端跨框架 BDD UI 自动化测试方案,BDD testing, Python style, Present by Trip Flight 1K
1K
A library for electronic Know Your Customer (eKYC) verification 1K
Video Archive AI analysis tool 1K
Huggingface bolts for geniusrise 1K
Multi-engine OCR pipeline — beats Google Vision API 1K
Collection of Taiwan Rental House Data from Public Website 1K
Parse, extract, and analyze documents with ease 1K
阻止群成员发送广告内容,过滤内容可配置 1K
963
Extract text and information from pdf files 929
An AI assistant powered by Llama models 913
一款适用于QQ群聊天的语录库插件 900
FlexiData is an open-source Python package designed for processing unstructured ... 720
718
Plugins to enable usage of PaddleOCR in ocr_translate 716
Exploit computer vision technology with Orange Data Mining ! 642
Actscene OCR: 日本語書類向けの包括的OCRパイプライン (PaddleOCRベース) 601
uni cli 573
airclick 相关python包 569
A fast automatic number-plate recognition (ANPR) library 562
跨平台的UI自动化框架,适用于混合型app 554
Privision 是一款强大的视频内容脱敏工具,采用先进的 OCR 技术自动识别并打码视频中的敏感信息。支持手机号、身份证号、自定义关键字等多种检测模式,提供... 469
Comic-Focused Hybrid OCR Library, made in python 436
PaddleOCR engine plugin for OCRmyPDF 432
Using LLM to parse PDF and get better chunk for retrieval 428
A tool to classify images 414
Converta documentos para Markdown estruturado — libere seu conhecimento de PDFs 384
A modular QSR Order Verification Python Package 381
Extracts citations from PDF, URLs and local media files in CSL-JSON. 369
Extração OCR de processos judiciais — PDF para Markdown 363
이미지를 시각적 경계 기준으로 안전하게 섹션 분리 & OCR (Korean, Chinese) 353
Use json5 for view-based workflows configuration 340
A powerful tool to extract text, tables, charts, and formulas from documents and... 329
A robust MRZ extraction and validation engine library designed for real-world K... 328
Deterministyczny generator identyfikatorów dokumentów z OCR 313
Local-first Python RAG pipeline with sentence-transformer embeddings, FAISS/BM25... 307