56 dependents
Package Description Downloads/month
Generative models for conditional audio generation 118K
endoreg-db 25K
A package for NeuCodec, based on xcodec2. 24K
Build high-performance AI models with modular building blocks 19K
Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation... 18K
Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch 16K
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural netw... 15K
A generative speech model for daily dialogue. 7K
Implementation of the MetaController proposed in "Emergent temporal abstractions... 7K
Simplest working implementation of Stylegan2, state of the art generative advers... 6K
A general fine-tuning kit geared toward image/video/audio diffusion models. 6K
Implementation of MagViT2 Tokenizer in Pytorch 5K
Implementation of a transformer for reinforcement learning using `x-transformers... 4K
Implementation of the new SOTA for model based RL, from the paper "Improving Tra... 3K
Implementation of Phenaki Video, which uses Mask GIT to produce text guided vide... 3K
Natural Speech 2 - Pytorch 3K
A differentiable version of SPTK 2K
Implementation of Muse: Text-to-Image Generation via Masked Generative Transform... 2K
A library for XCodec2 model. 1K
Genie2 1K
Implementation of Parti, Google's pure attention-based text-to-image neural netw... 1K
This library provides color and tyle transfer algorithms which were published in... 658
Boson Multimodal - A multimodal AI framework 657
Ichigo is an open, ongoing research experiment to extend a text-based LLM to hav... 475
Superfeel adaptation of implementation of SoundStorm, Efficient Parallel Audio G... 472
ViLLa-X 422
Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens f... 403
ChatTTS is a generative speech model for daily dialogue. 402
PhysioEx, a PyTorch Lightning based library for Interpretable physiological sign... 396
Generative models for conditional audio generation 371
Ichigo Whisper is a compact (22M parameters), open-source speech tokenizer for t... 370
Transformers at zeta scales 369
Audio tokenization, in the fastest way possible! 367
Amplify 360
Glaucus is a PyTorch complex-valued ML autoencoder & RF estimation python module... 340
Fish Speech pipeline as library so you don't need to webui. 334
Foundation model for EEG reconstruction and interpolation 324
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for g... 310
Fish Speech 1.5 TTS plugin for VOCO audio inference runtime 266
Sylber: Syllabic Embedding Representation of Speech from Raw Audio 247
Implementation of BEST-RQ - a model for self-supervised learning of speech signa... 233
Ichigo Whisper is a compact (22M parameters), open-source speech tokenizer for t... 208
ACE-Step 1.5 190
Tokenizing Loops of Antibodies (arXiv:2509.08707) 186
Progressive Action Tokenizer framework with FSQ quantization 168
MaskBit 137
Streaming DVAE 136
SongBloom package for tts-webui 122
VoiceStudio: A unified toolkit for text-style prompted speech synthesis, voice a... 119
Implementation of SoundStream, an end-to-end neural audio codec 107