71 dependents
| Package | Description | Downloads/month |
|---|---|---|
| SGLang is a high-performance serving framework for large language models and mul... | 287.7M | |
| State-of-the-art speaker diarization toolkit | 2.2M | |
| WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarizatio... | 1.1M | |
| 🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning | 204K | |
| VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice D... | 122K | |
| Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech wi... | 104K | |
| An immersion toolkit for learning Languages through games and other visual media... | 50K | |
| senselab is a Python package that simplifies building pipelines for biometric (e... | 13K | |
| Python implementation of an ASR service | 12K | |
| Echo-TTS inference codebase | 10K | |
| Implementation of HS-TasNet, "Real-time Low-latency Music Source Separation usin... | 6K | |
| A general fine-tuning kit geared toward image/video/audio diffusion models. | 6K | |
| VoxCPM inference engine based on Nano-vLLM | 6K | |
| Low-code framework for building custom LLMs, neural networks, and other AI model... | 5K | |
| SGLang is a high-performance serving framework for large language models and mul... | 4K | |
| 3K | ||
| Extract phoneme-level timestamps from speeh audio. | 2K | |
| Collection of utility tools and deep learning methods for multimodal feature ext... | 2K | |
| SILMA TTS v1 Official Repo — a Lightweight Open Bilingual Text to Speech Model | 2K | |
| A Python package for extracting and decomposing rhythmic facial movements from v... | 2K | |
| Liquid Audio - Speech-to-Speech audio models by Liquid AI | 2K | |
| A unified inference and post-training framework for accelerated video generation... | 2K | |
| Minimal implementation of HuBERT pretraining | 1K | |
| SHAP implementation for multimodal large language models supporting audio and te... | 956 | |
| Open language modeling toolkit based on PyTorch | 752 | |
| Evaluating Text-to-Visual Generation with Image-to-Text Generation. | 704 | |
| Audio denoising and transcription pipeline using Demucs, Silero VAD, and Whisper | 698 | |
| Modern speech recognition with word-level timestamps and speaker diarization. Fo... | 687 | |
| Helper package for using quantized versions of the Indic ASR Model by AI4Bharat. | 590 | |
| X-Voice | 571 | |
| Constraint-aware audio resynthesis and distillation pipeline. | 505 | |
| Core shared libraries for multimodal Kani extensions. | 475 | |
| Code for the paper Hybrid Spectrogram and Waveform Source Separation | 469 | |
| PyPi package for KaniTTS-2 model | 432 | |
| VoxCPM inference engine based on Nano-vLLM | 365 | |
| SoTA open-source TTS | 360 | |
| Chatterbox: Open Source TTS and Voice Conversion by Resemble AI | 353 | |
| Tensor's VLA Training Infrastructure for Real-World Robotics in PyTorch | 353 | |
| Local AI audio restoration. Phone recording to podcast quality. Zero cloud. | 346 | |
| A standalone version of [LerobotDataset](https://github.com/huggingface/lerobot/... | 320 | |
| Simple, readable audio stem separation library | 271 | |
| Tools for downloading and processing Ocean Networks Canada hydrophone data | 264 | |
| Unit HiFi-GAN | 238 | |
| VocalID is an open-source Python library for voice authentication using ECAPA-TD... | 223 | |
| A package for DJ ideation with ML and AI. | 202 | |
| RoboCOIN: A toolkit for the RoboCOIN dataset | 197 | |
| Indic Conformer ASR Lib | 195 | |
| HuBERT (Hidden Unit BERT) implementation in MLX for Apple Silicon | 192 | |
| ACE-Step 1.5 | 190 | |
| SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support | 186 |