PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Vision Transformer Python Packages

Python packages with the GitHub topic vision-transformer. Sorted by relevance, with stars and monthly downloads.
open-mmlab
mmdet

OpenMMLab Detection Toolbox and Benchmark

436K 33K 10K
Blaizzy
mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

348K 5K 524
open-mmlab
mmcls

OpenMMLab Pre-training Toolbox and Benchmark

50K 4K 1K
open-mmlab
mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

21K 4K 1K
lukas-blecher
pix2tex

pix2tex: Using a ViT to convert images of equations into LaTeX code.

11K 16K 1K
emcf
thepipe-api

Get clean data from tricky documents, powered by vision-language models ⚡

3K 2K 99
NVlabs
mambavision

[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone

2K 2K 139
towhee-io
towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

2K 3K 261
NVlabs
fastervit

FasterViT: Fast Vision Transformers with Hierarchical Attention

2K 914 69
lygitdata
garmentiq

Free & Open Source. Precise and flexible garment measurements from images - no tape measures, no delays, just fashion - forward automation.

2K 20 4
alibaba
pai-easycv

An all-in-one toolkit for computer vision

1K 2K 225
veb-101
attention-and-transformers

Transformers goes brrr... Attention and Transformers from scratch in TensorFlow. Currently contains Vision transformers, MobileViT-v1, MobileViT-v2, MobileViT-v3

1K 14 2
kyegomez
clipq

A simple implementation of a CLIP that splits up an image into quandrants and then gets the embeddings for each quandrant

1K 7 1
towhee-io
towhee-models

Towhee is a framework that helps you encode your unstructured data into embeddings.

1K 3K 261
sovit-123
vision-transformers

Vision Transformers for image classification, image segmentation, and object detection.

1K 67 9
mit-han-lab
efficientvit-gml

open-set object detector

996 3K 240
fmegahed
conformal-clip

Few-shot CLIP classification with conformal prediction, probability calibration, and reliability metrics.

822 0 0
martinsbruveris
tfimm

TensorFlow port of PyTorch Image Models (timm) - image models with pretrained weights.

813 291 25
evanatyourservice
image-classification-jax

Image classification in JAX with ViT, resnet, cifar10, cifar100, imagenette, and imagenet

667 3 0
DavidLandup0
deepvision-toolkit

PyTorch and TensorFlow/Keras image models with automatic weight conversions and equal API/implementations - Vision Transformer (ViT), ResNetV2, EfficientNetV2, NeRF, SegFormer, MixTransformer, (planned...) DeepLabV3+, ConvNeXtV2, YOLO, etc.

604 42 7
open-mmlab
mmdet-taeuk4958

OpenMMLab Detection Toolbox and Benchmark

578 33K 10K
zhongkaifu
seq2seqsharp

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, MacOS), multimodal model for text and images and so on.

480 211 43
TheoCoombes
clipcap

Using pretrained encoder and language models to generate captions from multimedia inputs.

479 100 14
idso-fa1-pathology
vitaminp

VitaminP: a vision transformer-assisted multimodal integration network for pathology cell segmentation

473 8 1
    • Data from PyPI, GitHub, ClickHouse, and BigQuery