57 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Open Source framework for voice and multimodal conversational AI | 689K | |
| Open Audio Watermarking Tool | 158K | |
| SoTA open-source TTS | 98K | |
| An open source chat bot architecture for voice/vision (and multimodal) assistant... | 14K | |
| Open Source framework for voice and multimodal conversational AI | 12K | |
| Utilities for handling audio. | 6K | |
| Python conversion of Timbre Toolbox | 6K | |
| zero-shot voice conversion & singing voice conversion, with real-time support | 5K | |
| GailBot API | 4K | |
| Clarity Challenge toolkit - software for building Clarity Challenge systems | 4K | |
| Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Stream... | 3K | |
| Modular media quality metrics toolkit. | 3K | |
| Python SDK for Palabra AI's real-time speech-to-speech translation API. Break do... | 2K | |
| Ableton for AI is the bridge between your DAW and AI models. It makes Ableton pr... | 2K | |
| AI-powered reference track suggestions for music producers | 2K | |
| Speech Recording Application | 2K | |
| A Python library for generating synthetic speech datasets using TTS providers. | 2K | |
| Preprocess audio data | 1K | |
| 1K | ||
| Build 📞 Telephonic-Grade Voice AI — 🌐 WebRTC-Ready Framework | 1K | |
| Python tools for text to speech (TTS), speech to text (STT), and speech to speec... | 1K | |
| Stemgen is a Stem file generator. Convert any track into a stem and have fun wit... | 1K | |
| This is a faster implementation for TTS models, to be used in highly async envir... | 1K | |
| Google EMEA gTech Ads Data Science Team's solution to automatically translate an... | 1K | |
| 1K | ||
| An easy-to-use library and command-line tool for TTS | 1K | |
| Audio ZEN is a library for audio/speech signal processing. | 809 | |
| Loudness added. | 648 | |
| Metrics to measure the quality of audio | 619 | |
| X-Voice | 591 | |
| A library for soundscape synthesis and augmentation - extension to add speech, d... | 531 | |
| fast SoulX-FlashTalk for RTX 4090 | 466 | |
| PDF processing pipeline: remove headers/footers, convert to markdown, and genera... | 414 | |
| data parsing / loading facility for handy audio machine learning. powered by lmd... | 383 | |
| Audio Base App and Framework | 375 | |
| Chatterbox: Open Source TTS and Voice Conversion by Resemble AI | 370 | |
| A comprehensive Model Context Protocol (MCP) server that enables AI agents to cr... | 360 | |
| SeaVAD: Voice Activity Detection module with silero and state machine. | 329 | |
| Local AI audio restoration. Phone recording to podcast quality. Zero cloud. | 324 | |
| 基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。 | 323 | |
| Production-ready transcription and diarization pipeline with parallel processing | 302 | |
| Voice Identity Management Platform | 290 | |
| Structured audio comparison for producers and developers. Think git diff, but fo... | 286 | |
| File operations and other tools for working with .WAV files | 265 | |
| CLI Audio Analyzer for Music Producers - DnB, Techno, House | 217 | |
| 210 | ||
| Indic Conformer ASR Lib | 201 | |
| A python package for neural audio effect | 190 | |
| Split to stems, match to reference, recombine and match as a whole. | 167 | |
| Open Source framework for voice and multimodal conversational AI | 161 |