Dependents of qwen-vl-utils

61 dependents

Package	Description	Downloads/month
mineru	Transforms complex documents like PDFs and Office docs into LLM-ready markdown/J...	282K
megatron-bridge	Training library for Megatron-based models with bidirectional Hugging Face conve...	29K
lmms-eval	One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio T...	16K
ms-vlmeval	OpenCompass VLM Evaluation Kit for Eval-Scope	15K
hpsv3	Official implementation of HPSv3: Towards Wide-Spectrum Human Preference Score (...	6K
vllm-rbln	vLLM plugin for RBLN NPU	4K
ayase	Modular media quality metrics toolkit.	3K
aisak	AISAK, short for Artificially Intelligent Swiss Army Knife, is a general-purpose...	3K
photonamer	Photonamer: Autonomous photo file renaming tool using local Visual-Language Mode...	2K
cosmos-rl	Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialize...	2K
roboreason	Roboreason package	2K
pytorch-image-translation-models	A PyTorch library for multi-modal image translation with diffusion bridges, GANs...	2K
agent-as-annotators	Agent-as-Annotators: Structured Distillation of Web Agent Capabilities	1K
openzerox	A wrapper for the Qwen2-VL model based for image-based inference to convert pdf ...	1K
infinity-parser2	INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced...	1K
ragrag	Local multimodal semantic search for large documents with complex diagrams (like...	1K
gst-python-ml	An ML package for GStreamer	1K
cosmos-predict2	Cosmos-Predict2 is a collection of general-purpose world foundation models for P...	1K
dora-qwen2-5-vl	Dora Node for VLM	984
computer-use-ootb	Computer Use OOTB	853
editscore	[ICLR 2026] EditScore: Unlocking Online RL for Image Editing via High-Fidelity R...	730
t2v-metrics	Evaluating Text-to-Visual Generation with Image-to-Text Generation.	704
llama-index-multi-modal-llms-huggingface	llama-index multi_modal_llms HuggingFace integration by [Cihan Yalçın](https://w...	681
dora-qwenvl	Dora Node for VLM	600
vlm-dataset-captioner	Uses a VLM to caption images from a dataset.	581
hyvideo-fork	Hunyuan Video 1.5	531
multimodel-ai	A Python module for efficient multi-model AI inference with memory management	470
acai-swarm	AI agent swarm orchestrator for coding	424
geoai-vlm	Geospatial Vision-Language Model analysis for street-level imagery. Download Map...	406
dora-rdt-1b	Dora Node for RDT 1B	397
kalorda	An integrated fine-tuning platform for lightweight vlmOCR models	382
orign-runtime	Python runtime for Orign	376
siirl	siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Ag...	348
unipercept-reward	[ICML26 Spotlight] UniPercept: Towards Unified Perceptual-Level Image Understand...	334
vlm2vec-for-pyserini	This repo is a fork of the original VLM2Vec repo, modified for easy Pyserini int...	326
bocr	A Python package for OCR using Vision LLMs	269
beprepared		260
livecc-utils	LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 20...	208
unitrust	Slimmed release mirror of UniTrust for AEN and TruthPrInt.	192
proactivebench	Benchmark utilities and environments for evaluating multimodal LLMs' proactivene...	190
omagent-core	Core package for OmAgent	182
pdf2md-llm	Use a local LLM to convert PDF to Markdown	175
otat	Vision-Language Model Interpretability Analysis - One Token at a Time	168
strands-cosmos	NVIDIA Cosmos Reason VLM provider for Strands Agents - physical AI reasoning, vi...	157
seagnal	Add your description here	146
computer-interact	We are building a python package for building computer use capability that can a...	122
brute-force-training	A simple no frills brute force unoptimized training package for VLMs	121
groundcua	Helper utilities and constants for GroundNext models - Computer Use Agents for g...	118
web-operator	A library for automating web tasks	110
openemma	OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA mod...	99