Vision Language Model Python Packages

mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

349K 5K 506

colpali-engine

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

164K 3K 250

lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

16K 4K 578

mlx-tune

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

14K 1K 79

x-anylabeling-cvhub

Effortless data labeling with AI support from Segment Anything and other awesome models.

6K 9K 971

vidore-benchmark

Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.

4K 271 35

thepipe-api

Get clean data from tricky documents, powered by vision-language models ⚡

3K 2K 99

lica-gdb

GDB: GraphicDesignBench — benchmark suite for evaluating vision-language models on graphic design tasks

2K 6 1

x-anylabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

1K 9K 971

mbodied

Seamlessly integrate state-of-the-art transformer models into robotics stacks

1K 285 32

nerif

LLM powered Python

1K 15 5

lica-gdb-helm

GDB: GraphicDesignBench - A real-world benchmark for evaluating AI on graphic design tasks

1K 6 1

urban-worm

Workflow of reproducible multimodal inference for urban environment evaluation.

1K 5 4

llava-torch

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

815 25K 3K

kani-vision

Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.

725 7 0

kanoa

AI-powered interpretation of data science outputs with multi-backend support (Molmo, Gemini, Claude, OpenAI)

722 1 0

visionzip

Official repository for VisionZip (CVPR 2025)

581 427 23

unsloth-mlx

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

528 1K 79

kani-multimodal-core

Core shared libraries for multimodal Kani extensions.

475 2 0