Dependents of torchcodec

71 dependents

Package	Description	Downloads/month
sglang	SGLang is a high-performance serving framework for large language models and mul...	287.7M
pyannote-audio	State-of-the-art speaker diarization toolkit	2.2M
whisperx	WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarizatio...	1.1M
lerobot	🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning	204K
voxcpm	VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice D...	122K
f5-tts	Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech wi...	104K
gamesentenceminer	An immersion toolkit for learning Languages through games and other visual media...	50K
senselab	senselab is a Python package that simplifies building pipelines for biometric (e...	13K
caul	Python implementation of an ASR service	12K
echo-tts	Echo-TTS inference codebase	10K
hs-tasnet	Implementation of HS-TasNet, "Real-time Low-latency Music Source Separation usin...	6K
simpletuner	A general fine-tuning kit geared toward image/video/audio diffusion models.	6K
nano-vllm-voxcpm	VoxCPM inference engine based on Nano-vLLM	6K
ludwig	Low-code framework for building custom LLMs, neural networks, and other AI model...	5K
sglang-kt	SGLang is a high-performance serving framework for large language models and mul...	4K
buzz-captions		3K
bournemouth-forced-aligner	Extract phoneme-level timestamps from speeh audio.	2K
exordium	Collection of utility tools and deep learning methods for multimodal feature ext...	2K
silma-tts	SILMA TTS v1 Official Repo — a Lightweight Open Bilingual Text to Speech Model	2K
face-rhythm	A Python package for extracting and decomposing rhythmic facial movements from v...	2K
liquid-audio	Liquid Audio - Speech-to-Speech audio models by Liquid AI	2K
fastvideo	A unified inference and post-training framework for accelerated video generation...	2K
minimal-hubert	Minimal implementation of HuBERT pretraining	1K
mllm-shap	SHAP implementation for multimodal large language models supporting audio and te...	956
eole	Open language modeling toolkit based on PyTorch	752
t2v-metrics	Evaluating Text-to-Visual Generation with Image-to-Text Generation.	704
dinscribe	Audio denoising and transcription pipeline using Demucs, Silero VAD, and Whisper	698
murmurai-core	Modern speech recognition with word-level timestamps and speaker diarization. Fo...	687
indic-asr-onnx	Helper package for using quantized versions of the Indic ASR Model by AI4Bharat.	590
x-voice	X-Voice	571
card-framework	Constraint-aware audio resynthesis and distillation pipeline.	505
kani-multimodal-core	Core shared libraries for multimodal Kani extensions.	475
demucs-torchcodec	Code for the paper Hybrid Spectrogram and Waveform Source Separation	469
kani-tts-2	PyPi package for KaniTTS-2 model	432
nano-vllm-voxcpm-hifi	VoxCPM inference engine based on Nano-vLLM	365
chatterbox-mlx	SoTA open-source TTS	360
chatterbox-ng	Chatterbox: Open Source TTS and Voice Conversion by Resemble AI	353
opentau	Tensor's VLA Training Infrastructure for Real-World Robotics in PyTorch	353
phonepod	Local AI audio restoration. Phone recording to podcast quality. Zero cloud.	346
lerobot-dataset	A standalone version of [LerobotDataset](https://github.com/huggingface/lerobot/...	320
stem-splitter	Simple, readable audio stem separation library	271
onc-hydrophone-data	Tools for downloading and processing Ocean Networks Canada hydrophone data	264
unit-hifigan	Unit HiFi-GAN	238
vocalid	VocalID is an open-source Python library for voice authentication using ECAPA-TD...	223
djai	A package for DJ ideation with ML and AI.	202
robocoin	RoboCOIN: A toolkit for the RoboCOIN dataset	197
shruti	Indic Conformer ASR Lib	195
mlx-hubert	HuBERT (Hidden Unit BERT) implementation in MLX for Apple Silicon	192
ace-step15-fork	ACE-Step 1.5	190
power-sglang-cuda124	SGLang fork for ppc64le with CUDA 12.4 and Torch Triton support	186