56 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Generative models for conditional audio generation | 118K | |
| endoreg-db | 25K | |
| A package for NeuCodec, based on xcodec2. | 24K | |
| Build high-performance AI models with modular building blocks | 19K | |
| Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation... | 18K | |
| Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch | 16K | |
| Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural netw... | 15K | |
| A generative speech model for daily dialogue. | 7K | |
| Implementation of the MetaController proposed in "Emergent temporal abstractions... | 7K | |
| Simplest working implementation of Stylegan2, state of the art generative advers... | 6K | |
| A general fine-tuning kit geared toward image/video/audio diffusion models. | 6K | |
| Implementation of MagViT2 Tokenizer in Pytorch | 5K | |
| Implementation of a transformer for reinforcement learning using `x-transformers... | 4K | |
| Implementation of the new SOTA for model based RL, from the paper "Improving Tra... | 3K | |
| Implementation of Phenaki Video, which uses Mask GIT to produce text guided vide... | 3K | |
| Natural Speech 2 - Pytorch | 3K | |
| A differentiable version of SPTK | 2K | |
| Implementation of Muse: Text-to-Image Generation via Masked Generative Transform... | 2K | |
| A library for XCodec2 model. | 1K | |
| Genie2 | 1K | |
| Implementation of Parti, Google's pure attention-based text-to-image neural netw... | 1K | |
| This library provides color and tyle transfer algorithms which were published in... | 658 | |
| Boson Multimodal - A multimodal AI framework | 657 | |
| Ichigo is an open, ongoing research experiment to extend a text-based LLM to hav... | 475 | |
| Superfeel adaptation of implementation of SoundStorm, Efficient Parallel Audio G... | 472 | |
| ViLLa-X | 422 | |
| Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens f... | 403 | |
| ChatTTS is a generative speech model for daily dialogue. | 402 | |
| PhysioEx, a PyTorch Lightning based library for Interpretable physiological sign... | 396 | |
| Generative models for conditional audio generation | 371 | |
| Ichigo Whisper is a compact (22M parameters), open-source speech tokenizer for t... | 370 | |
| Transformers at zeta scales | 369 | |
| Audio tokenization, in the fastest way possible! | 367 | |
| Amplify | 360 | |
| Glaucus is a PyTorch complex-valued ML autoencoder & RF estimation python module... | 340 | |
| Fish Speech pipeline as library so you don't need to webui. | 334 | |
| Foundation model for EEG reconstruction and interpolation | 324 | |
| [NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for g... | 310 | |
| Fish Speech 1.5 TTS plugin for VOCO audio inference runtime | 266 | |
| Sylber: Syllabic Embedding Representation of Speech from Raw Audio | 247 | |
| Implementation of BEST-RQ - a model for self-supervised learning of speech signa... | 233 | |
| Ichigo Whisper is a compact (22M parameters), open-source speech tokenizer for t... | 208 | |
| ACE-Step 1.5 | 190 | |
| Tokenizing Loops of Antibodies (arXiv:2509.08707) | 186 | |
| Progressive Action Tokenizer framework with FSQ quantization | 168 | |
| MaskBit | 137 | |
| Streaming DVAE | 136 | |
| SongBloom package for tts-webui | 122 | |
| VoiceStudio: A unified toolkit for text-style prompted speech synthesis, voice a... | 119 | |
| Implementation of SoundStream, an end-to-end neural audio codec | 107 |