82 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Python library for audio and music analysis | 9.7M | |
| Open Source framework for voice and multimodal conversational AI | 677K | |
| A Python library for audio data augmentation. Useful for making audio ML models ... | 178K | |
| Real-time avatar engine — 100+ FPS on CPU. Generate lip-synced video, stream liv... | 104K | |
| Open Source framework for voice and multimodal conversational AI | 11K | |
| Accurate and general beat tracker | 11K | |
| Ubo main app, running on device initialization. A platform for running other app... | 8K | |
| Universal local runtime for STT and TTS models | 6K | |
| FASR: Fast Automatic Speech Recognition Pipeline | 4K | |
| Ultimate RVC | 4K | |
| Software Decoder for raw rf captures of laserdisc, vhs and other analog video fo... | 4K | |
| Clarity Challenge toolkit - software for building Clarity Challenge systems | 3K | |
| A simple yet effective Audio-to-Midi Automatic Piano Transcription system | 3K | |
| Visualize and maintain datasets to develop and understand data-driven algorithms... | 3K | |
| Use bacpipe to streamline the process of generating embeddings and analysing you... | 3K | |
| Global hotkeys to record speech and transcribe directly to your cursor | 2K | |
| A Creative Computing Python Library for Interactive Audio Generation and Audio R... | 2K | |
| A framework for computer music in python | 2K | |
| The Phoneme Discovery Benchmark | 2K | |
| DeepFense: A Unified, Modular, and Extensible Framework for Robust Deepfake Audi... | 1K | |
| A unified interface to extract hidden representations from speech foundation mod... | 1K | |
| Illuminat: Revolutionizing Education through Personalization | 1K | |
| Build 📞 Telephonic-Grade Voice AI — 🌐 WebRTC-Ready Framework | 1K | |
| GPT-SoVITS ONNX Inference Engine & Model Converter | 1K | |
| An easy-to-use library and command-line tool for TTS | 1K | |
| Easy Audio Interfaces is a Python library that provides a simple and flexible wa... | 1K | |
| Music Modeling Kit | 966 | |
| OSEkit | 889 | |
| Tiny macOS dictation tool on your menubar | 762 | |
| A python forced alignment package | 641 | |
| VoxCPM TTS model with Apple Neural Engine backend server | 641 | |
| Metrics to measure the quality of audio | 600 | |
| This is a library consisting of pre-trained models for the synthesis of Russian ... | 532 | |
| BackdoorMBTI is an open source project expanding the unimodal backdoor learning ... | 531 | |
| Score UltraStar karaoke files against vocal audio using Vocaluxe pitch detection... | 484 | |
| Real-time audio transcription MCP server for Claude Code | 483 | |
| Unified, inference-only toolkit for MT3 model family (Magenta MT3, MR-MT3, MT3-P... | 481 | |
| Python Wrapper of Silero VAD | 476 | |
| Voxtral audio processing and model implementation for Apple Silicon using MLX | 438 | |
| GPT-SoVITS ONNX Inference Engine & Model Converter | 426 | |
| Live music performance playback | 416 | |
| PDF processing pipeline: remove headers/footers, convert to markdown, and genera... | 410 | |
| 山东联通产互AI工具箱 | 390 | |
| Effective evaluations for Text-to-Speech (TTS) systems | 390 | |
| 370 | ||
| tensorflow generation of SOX-style spectrograms on the GPU | 359 | |
| FFTrack is a Python-based music recognition tool that allows users to identify s... | 341 | |
| Librería para detectar emociones en imágenes y audio usando Robobo | 321 | |
| 基于SparkTTS、OrpheusTTS等模型,提供高质量中文语音合成与声音克隆服务。 | 303 | |
| A Python library for applying information theory and AI/ML models to animal comm... | 303 |