61 dependents
Package Description Downloads/month
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/J... 282K
Training library for Megatron-based models with bidirectional Hugging Face conve... 29K
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio T... 16K
OpenCompass VLM Evaluation Kit for Eval-Scope 15K
Official implementation of HPSv3: Towards Wide-Spectrum Human Preference Score (... 6K
vLLM plugin for RBLN NPU 4K
Modular media quality metrics toolkit. 3K
AISAK, short for Artificially Intelligent Swiss Army Knife, is a general-purpose... 3K
Photonamer: Autonomous photo file renaming tool using local Visual-Language Mode... 2K
Cosmos-RL is a flexible and scalable Reinforcement Learning framework specialize... 2K
Roboreason package 2K
A PyTorch library for multi-modal image translation with diffusion bridges, GANs... 2K
Agent-as-Annotators: Structured Distillation of Web Agent Capabilities 1K
A wrapper for the Qwen2-VL model based for image-based inference to convert pdf ... 1K
INF Tech's open-source MLLMs for SOTA visual-language understanding and advanced... 1K
Local multimodal semantic search for large documents with complex diagrams (like... 1K
An ML package for GStreamer 1K
Cosmos-Predict2 is a collection of general-purpose world foundation models for P... 1K
Dora Node for VLM 984
Computer Use OOTB 853
[ICLR 2026] EditScore: Unlocking Online RL for Image Editing via High-Fidelity R... 730
Evaluating Text-to-Visual Generation with Image-to-Text Generation. 704
llama-index multi_modal_llms HuggingFace integration by [Cihan Yalçın](https://w... 681
Dora Node for VLM 600
Uses a VLM to caption images from a dataset. 581
Hunyuan Video 1.5 531
A Python module for efficient multi-model AI inference with memory management 470
AI agent swarm orchestrator for coding 424
Geospatial Vision-Language Model analysis for street-level imagery. Download Map... 406
Dora Node for RDT 1B 397
An integrated fine-tuning platform for lightweight vlmOCR models 382
Python runtime for Orign 376
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Ag... 348
[ICML26 Spotlight] UniPercept: Towards Unified Perceptual-Level Image Understand... 334
This repo is a fork of the original VLM2Vec repo, modified for easy Pyserini int... 326
A Python package for OCR using Vision LLMs 269
260
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 20... 208
Slimmed release mirror of UniTrust for AEN and TruthPrInt. 192
Benchmark utilities and environments for evaluating multimodal LLMs' proactivene... 190
Core package for OmAgent 182
Use a local LLM to convert PDF to Markdown 175
Vision-Language Model Interpretability Analysis - One Token at a Time 168
NVIDIA Cosmos Reason VLM provider for Strands Agents - physical AI reasoning, vi... 157
Add your description here 146
We are building a python package for building computer use capability that can a... 122
A simple no frills brute force unoptimized training package for VLMs 121
Helper utilities and constants for GroundNext models - Computer Use Agents for g... 118
A library for automating web tasks 110
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA mod... 99