PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Multimodal Python Packages

Python packages with the GitHub topic multimodal. Sorted by relevance, with stars and monthly downloads.
yzhao062
pyod

A Python library for anomaly detection across tabular, time series, graph, text, and image data. 60+ detectors, benchmark-backed ADEngine orchestration, and an agentic workflow for AI agents.

2.7M 10K 1K
embeddings-benchmark
mteb

MTEB: Massive Text Embedding Benchmark

2.7M 3K 608
rerun-io
rerun-sdk

An open source SDK for logging, storing, querying, and visualizing multimodal and multi-rate data

2.2M 11K 720
Eventual-Inc
daft

High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale

833K 5K 457
vllm-project
vllm-omni

A framework for efficient model inference with omni-modality models

476K 5K 867
vortex-data
vortex-data

An extensible, state-of-the-art framework for columnar compression, and the fastest FOSS columnar file format. Formerly at @spiraldb, now an Incubation Stage project at LFAI&Data, part of the Linux Foundation.

289K 3K 149
activeloopai
deeplake

Deeplake is AI Data Runtime for Agents. It provides serverless postgres with a multimodal datalake, enabling scalable retrieval and training.

228K 9K 709
bentoml
bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

197K 9K 959
modelscope
ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-R1, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).

176K 14K 1K
vlm-run
vlmrun-hub

A hub for various industry-specific schemas to be used with VLMs.

173K 543 24
jina-ai
jina

☁️ Build multimodal AI applications with cloud-native stack

154K 22K 2K
docarray
docarray

Represent, send, store and search multimodal data

144K 3K 241
Blaizzy
mlx-audio

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

87K 7K 578
microsoft
torchscale

Foundation Architecture for (M)LLMs

79K 3K 225
open-mmlab
mmcls

OpenMMLab Pre-training Toolbox and Benchmark

50K 4K 1K
datachain-ai
datachain

Data Memory: the operational data context layer for AI agents - typed, versioned datasets over images, video, docs and tables

49K 3K 140
rom1504
img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

47K 4K 375
Stability-AI
stability-sdk

SDK for interacting with stability.ai APIs (e.g. stable diffusion inference)

34K 2K 344
predict-idlab
tsflex

Flexible time series feature extraction & processing

23K 438 28
open-mmlab
mmpretrain

OpenMMLab Pre-training Toolbox and Benchmark

21K 4K 1K
McGill-NLP
weblinx

WebLINX is a benchmark for building web navigation agents with conversational capabilities

20K 160 17
anam-org
metaxy

Pluggable sample-level metadata versioning for incremental multimodal pipelines.

17K 96 6
Capsize-Games
airunner

Offline inference engine for art, real-time voice conversations, LLM powered chatbots and automated workflows

14K 1K 97
eliranwong
letmedoit

LetMeDoIt AI, an advanced AI assistant, leveraging the capabilities of AI models, to resolve daily tasks for you.

14K 135 26
    • Data from PyPI, GitHub, ClickHouse, and BigQuery