984 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Qwen3-VL is the multimodal large language model series developed by Qwen team, A... | 528K | |
| Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python p... | 364K | |
| A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrai... | 357K | |
| 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and p... | 292K | |
| Pytorch implementation of the CREPE pitch tracker | 273K | |
| Qwen-TTS python package | 209K | |
| 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and p... | 202K | |
| A Python library for audio data augmentation. Useful for making audio ML models ... | 178K | |
| Contrastive Language-Audio Pretraining | 175K | |
| Open Audio Watermarking Tool | 157K | |
| Qwen-ASR python package | 127K | |
| VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice D... | 122K | |
| Audiocraft is a library for audio processing and generation with deep learning. ... | 122K | |
| Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech wi... | 104K | |
| High-Quality Voice Cloning TTS for 600+ Languages | 103K | |
| SoTA open-source TTS | 98K | |
| Open source library for running inference workload with Hugging Face Deep Learni... | 89K | |
| Basic Pitch, a lightweight yet powerful audio-to-MIDI converter with pitch bend ... | 63K | |
| 56K | ||
| The python library for real-time communication | 50K | |
| TensorFlow examples | 46K | |
| NeMo Retriever Library is a scalable, performance-oriented document content and ... | 31K | |
| End-to-End Speech Processing Toolkit | 30K | |
| py-webrtcvad wrapper for trimming speech clips | 28K | |
| This code is to run the WARP-Q speech quality metric. | 26K | |
| endoreg-db | 25K | |
| A high-performance API server that provides OpenAI-compatible endpoints for MLX ... | 22K | |
| A nearly-live implementation of OpenAI's Whisper. | 20K | |
| Use machine learning to create art and music | 20K | |
| AI-based Audio Watermarking Tool | 18K | |
| data represent, processing | 17K | |
| CLAP (Contrastive Language-Audio Pretraining) is a model that learns acoustic co... | 17K | |
| Command line utility for forced alignment using Kaldi | 15K | |
| NeuTTS - a package for text-to-speech generation using Neuphonic's TTS models. | 14K | |
| TensorFlow examples | 13K | |
| Easy to use audio stem separation with a UI, using various models from UVR train... | 12K | |
| Python implementation of an ASR service | 12K | |
| An implementation of the Nvidia's Parakeet models for Apple Silicon using MLX. | 11K | |
| so-vits-svc fork with realtime support, improved interface and more features. | 11K | |
| Phenomological Adaptive STochastic auditory nerve fiber model | 10K | |
| Integrate librosa, whisper with LLMs to analyze music audio. | 10K | |
| This is a auto-testing framework of audio functions for Android devices. | 10K | |
| Vietnamese TTS with instant voice cloning • On-device • Real-time CPU inference ... | 10K | |
| Django Smart Home | 9K | |
| libfmp - Python package for teaching and learning Fundamentals of Music Processi... | 8K | |
| An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, ... | 8K | |
| An STFT/iSTFT for PyTorch. | 8K | |
| FlashSR: One-step Versatile Audio Super-resolution via Diffusion Distillation / ... | 8K | |
| BeatNet is state-of-the-art (Real-Time) and Offline joint music beat, downbeat, ... | 7K | |
| Framework for building deep neural network models for sound, speech, and voice A... | 7K |