449 dependents
| Package | Description | Downloads/month |
|---|---|---|
| SGLang is a high-performance serving framework for large language models and mul... | 298.9M | |
| Faster Whisper transcription with CTranslate2 | 7.6M | |
| WebRTC and ORTC implementation for Python using asyncio | 3.4M | |
| A framework for building realtime voice AI agents 🤖🎙️📹 | 3.4M | |
| Qwen3-VL is the multimodal large language model series developed by Qwen team, A... | 1.9M | |
| Turn any computer or edge device into a command center for your computer vision ... | 1.1M | |
| Qwen3-VL is the multimodal large language model series developed by Qwen team, A... | 528K | |
| A framework for efficient model inference with omni-modality models | 476K | |
| A community-maintained Python framework for creating mathematical animations. | 292K | |
| 🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning | 204K | |
| A library containing components related to model inferences in Gen AI applicatio... | 167K | |
| Audiocraft is a library for audio processing and generation with deep learning. ... | 122K | |
| Turn any computer or edge device into a command center for your computer vision ... | 116K | |
| Real-time avatar engine — 100+ FPS on CPU. Generate lip-synced video, stream liv... | 108K | |
| The Python Risk Identification Tool for LLMs (PyRIT) is a library used to assess... | 83K | |
| A streaming audio reader, processor, and writer built on top of soundfile, and P... | 67K | |
| Voice chats, private incoming and outgoing calls in Telegram for Developers | 64K | |
| File handling utilities for CrewAI multimodal inputs | 53K | |
| Indent is an AI Pair Programmer | 51K | |
| Python API for UniFi Protect (Unofficial) | 46K | |
| Using RVC via console or python scripts | 38K | |
| All-in-one repository for state-of-the-art NeRFs | 34K | |
| Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024) | 29K | |
| A high-performance API server that provides OpenAI-compatible endpoints for MLX ... | 22K | |
| This tool has been deprecated. Use Agentic Document Extraction instead. | 21K | |
| Facial Expression Analysis Toolbox | 20K | |
| Developer kit for working with the NVIDIA Physical AI Autonomous Vehicles Datase... | 18K | |
| data represent, processing | 17K | |
| One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio T... | 17K | |
| Synchronized audio player for Sendspin servers | 16K | |
| A deep learning library for video understanding research. | 13K | |
| Fast and light weight simulator of rigid poly-articulated systems. | 13K | |
| New libcamera based python library | 13K | |
| Add your description here | 12K | |
| Argoverse 2: Next generation datasets for self-driving perception and forecastin... | 11K | |
| Telegram bot for expanding external media links | 11K | |
| Musicdl: A lightweight music downloader written in pure python. (轻量级无损音乐下载器,支持数十... | 10K | |
| [ACM MM Award] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dat... | 10K | |
| Data Infrastructure providing a declarative, incremental approach for multimodal... | 10K | |
| 战舰少女R全家桶 | 9K | |
| Turn any computer or edge device into a command center for your computer vision ... | 9K | |
| ZebraZoom can be used on fixed background videos to track the heads and tails of... | 8K | |
| Command line tool to help manage a library of music albums | 7K | |
| WebRTC and ORTC implementation for Python using asyncio | 7K | |
| A set of helper functions for processing and integrating visual inputs with Molm... | 7K | |
| OpenSportsLib is the professional library, designed for advanced video understan... | 7K | |
| Fast, multimodal context for agents. | 7K | |
| Turn any computer or edge device into a command center for your computer vision ... | 7K | |
| VideoSDK Python SDK | 7K | |
| Super SLURPy: Python version of Speech and Language Ultrasound Research Package | 7K |