PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
Blaizzy
mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

349K 5K 506
illuin-tech
colpali-engine

The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

164K 3K 250
EvolvingLMMs-Lab
lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

16K 4K 578
ARahim3
mlx-tune

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

14K 1K 79
CVHub520
x-anylabeling-cvhub

Effortless data labeling with AI support from Segment Anything and other awesome models.

6K 9K 971
illuin-tech
vidore-benchmark

Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.

4K 271 35
emcf
thepipe-api

Get clean data from tricky documents, powered by vision-language models ⚡

3K 2K 99
lica-world
lica-gdb

GDB: GraphicDesignBench — benchmark suite for evaluating vision-language models on graphic design tasks

2K 6 1
CVHub520
x-anylabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

1K 9K 971
mbodiai
mbodied

Seamlessly integrate state-of-the-art transformer models into robotics stacks

1K 285 32
Nerif-AI
nerif

LLM powered Python

1K 15 5
lica-world
lica-gdb-helm

GDB: GraphicDesignBench - A real-world benchmark for evaluating AI on graphic design tasks

1K 6 1
billbillbilly
urban-worm

Workflow of reproducible multimodal inference for urban environment evaluation.

1K 5 4
haotian-liu
llava-torch

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

815 25K 3K
zhudotexe
kani-vision

Kani extension for supporting vision-language models (VLMs). Comes with model-agnostic support for GPT-Vision and LLaVA.

725 7 0
lhzn-io
kanoa

AI-powered interpretation of data science outputs with multi-backend support (Molmo, Gemini, Claude, OpenAI)

722 1 0
dvlab-research
visionzip

Official repository for VisionZip (CVPR 2025)

581 427 23
ARahim3
unsloth-mlx

Fine-tune LLMs on your Mac with Apple Silicon. SFT, DPO, GRPO, Vision, TTS, STT, Embedding, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

528 1K 79
zhudotexe
kani-multimodal-core

Core shared libraries for multimodal Kani extensions.

475 2 0
Keyvanhardani
german-ocr

High-performance German document OCR - Local & Cloud with GPU/CPU support

439 94 6
NVlabs
ps3-torch

Scaling Vision Pre-Training to 4K Resolution

415 227 10
s-emanuilov
litepali

Lightweight ColPali-based retrieval for cloud

312 122 11
gptscript-ai
gptparse

Document parser for RAG

311 28 2
ymrohit
openscenesense-ollama

Offline video analysis using Ollama models and local Whisper

299 46 9
    • Data from PyPI, GitHub, ClickHouse, and BigQuery