71 dependents
Package Description Downloads/month
SGLang is a high-performance serving framework for large language models and mul... 287.7M
State-of-the-art speaker diarization toolkit 2.2M
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarizatio... 1.1M
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning 204K
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice D... 122K
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech wi... 104K
An immersion toolkit for learning Languages through games and other visual media... 50K
senselab is a Python package that simplifies building pipelines for biometric (e... 13K
Python implementation of an ASR service 12K
Echo-TTS inference codebase 10K
Implementation of HS-TasNet, "Real-time Low-latency Music Source Separation usin... 6K
A general fine-tuning kit geared toward image/video/audio diffusion models. 6K
VoxCPM inference engine based on Nano-vLLM 6K
Low-code framework for building custom LLMs, neural networks, and other AI model... 5K
SGLang is a high-performance serving framework for large language models and mul... 4K
3K
Extract phoneme-level timestamps from speeh audio. 2K
Collection of utility tools and deep learning methods for multimodal feature ext... 2K
SILMA TTS v1 Official Repo — a Lightweight Open Bilingual Text to Speech Model 2K
A Python package for extracting and decomposing rhythmic facial movements from v... 2K
Liquid Audio - Speech-to-Speech audio models by Liquid AI 2K
A unified inference and post-training framework for accelerated video generation... 2K
Minimal implementation of HuBERT pretraining 1K
SHAP implementation for multimodal large language models supporting audio and te... 956
Open language modeling toolkit based on PyTorch 752
Evaluating Text-to-Visual Generation with Image-to-Text Generation. 704
Audio denoising and transcription pipeline using Demucs, Silero VAD, and Whisper 698
Modern speech recognition with word-level timestamps and speaker diarization. Fo... 687
Helper package for using quantized versions of the Indic ASR Model by AI4Bharat. 590
X-Voice 571
Constraint-aware audio resynthesis and distillation pipeline. 505
Core shared libraries for multimodal Kani extensions. 475
Code for the paper Hybrid Spectrogram and Waveform Source Separation 469
PyPi package for KaniTTS-2 model 432
VoxCPM inference engine based on Nano-vLLM 365
SoTA open-source TTS 360
Chatterbox: Open Source TTS and Voice Conversion by Resemble AI 353
Tensor's VLA Training Infrastructure for Real-World Robotics in PyTorch 353
Local AI audio restoration. Phone recording to podcast quality. Zero cloud. 346
A standalone version of [LerobotDataset](https://github.com/huggingface/lerobot/... 320
Simple, readable audio stem separation library 271
Tools for downloading and processing Ocean Networks Canada hydrophone data 264
Unit HiFi-GAN 238
VocalID is an open-source Python library for voice authentication using ECAPA-TD... 223
A package for DJ ideation with ML and AI. 202
RoboCOIN: A toolkit for the RoboCOIN dataset 197
Indic Conformer ASR Lib 195
HuBERT (Hidden Unit BERT) implementation in MLX for Apple Silicon 192
ACE-Step 1.5 190
SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support 186