449 dependents
Package Description Downloads/month
SGLang is a high-performance serving framework for large language models and mul... 298.9M
Faster Whisper transcription with CTranslate2 7.6M
WebRTC and ORTC implementation for Python using asyncio 3.4M
A framework for building realtime voice AI agents 🤖🎙️📹 3.4M
Qwen3-VL is the multimodal large language model series developed by Qwen team, A... 1.9M
Turn any computer or edge device into a command center for your computer vision ... 1.1M
Qwen3-VL is the multimodal large language model series developed by Qwen team, A... 528K
A framework for efficient model inference with omni-modality models 476K
A community-maintained Python framework for creating mathematical animations. 292K
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning 204K
A library containing components related to model inferences in Gen AI applicatio... 167K
Audiocraft is a library for audio processing and generation with deep learning. ... 122K
Turn any computer or edge device into a command center for your computer vision ... 116K
Real-time avatar engine — 100+ FPS on CPU. Generate lip-synced video, stream liv... 108K
The Python Risk Identification Tool for LLMs (PyRIT) is a library used to assess... 83K
A streaming audio reader, processor, and writer built on top of soundfile, and P... 67K
Voice chats, private incoming and outgoing calls in Telegram for Developers 64K
File handling utilities for CrewAI multimodal inputs 53K
Indent is an AI Pair Programmer 51K
Python API for UniFi Protect (Unofficial) 46K
Using RVC via console or python scripts 38K
All-in-one repository for state-of-the-art NeRFs 34K
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024) 29K
A high-performance API server that provides OpenAI-compatible endpoints for MLX ... 22K
This tool has been deprecated. Use Agentic Document Extraction instead. 21K
Facial Expression Analysis Toolbox 20K
Developer kit for working with the NVIDIA Physical AI Autonomous Vehicles Datase... 18K
data represent, processing 17K
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio T... 17K
Synchronized audio player for Sendspin servers 16K
A deep learning library for video understanding research. 13K
Fast and light weight simulator of rigid poly-articulated systems. 13K
New libcamera based python library 13K
Add your description here 12K
argoverse av2
Argoverse 2: Next generation datasets for self-driving perception and forecastin... 11K
Telegram bot for expanding external media links 11K
Musicdl: A lightweight music downloader written in pure python. (轻量级无损音乐下载器,支持数十... 10K
[ACM MM Award] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dat... 10K
Data Infrastructure providing a declarative, incremental approach for multimodal... 10K
战舰少女R全家桶 9K
Turn any computer or edge device into a command center for your computer vision ... 9K
ZebraZoom can be used on fixed background videos to track the heads and tails of... 8K
Command line tool to help manage a library of music albums 7K
WebRTC and ORTC implementation for Python using asyncio 7K
A set of helper functions for processing and integrating visual inputs with Molm... 7K
OpenSportsLib is the professional library, designed for advanced video understan... 7K
Fast, multimodal context for agents. 7K
Turn any computer or edge device into a command center for your computer vision ... 7K
VideoSDK Python SDK 7K
Super SLURPy: Python version of Speech and Language Ultrasound Research Package 7K