54 dependents
| Package | Description | Downloads/month |
|---|---|---|
| An immersion toolkit for learning Languages through games and other visual media... | 49K | |
| Tools for building wake-word and speech-command datasets and models. | 7K | |
| Fish Speech | 6K | |
| Very fast, accurate speaker diarization | 4K | |
| Speaker diarization for Python — "who spoke when?" CPU-only, no API keys, Apache... | 4K | |
| 🥰 Building AI-based conversational avatars lightning fast ⚡️💬 | 3K | |
| Lightweight offline voice assistant for hands-free music control (YouTube Music ... | 3K | |
| Use your voice to trigger events and communicate with AI Agents. | 3K | |
| high quality multi-lingual speech to text | 2K | |
| Speech recognition with accurate word-level timestamps. | 2K | |
| VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Spe... | 1K | |
| 1K | ||
| Forced alignment pipeline designed for efficiency and ease of use. | 1K | |
| An ASR API server for FunASR | 1K | |
| Dora Node for Text translating using Argostranslate | 1K | |
| Python toolkit for the Qwen3-ASR API—parallel high‑throughput calls, robust long... | 1K | |
| Useful audio tools | 959 | |
| Normalize WAV voice recordings to a consistent target dB level using AGC, VAD, a... | 841 | |
| Audio ZEN is a library for audio/speech signal processing. | 809 | |
| Generic Speech AI Platform - Ollama for Voice Models | 645 | |
| Package to run ABC-MRT16 intelligibility tests | 610 | |
| Speechless repo for sales call analysis | 440 | |
| Utility library for singing voice conversion work | 426 | |
| Meeting recorder CLI that transcribes and generates AI notes | 423 | |
| Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD co... | 414 | |
| Lightweight local wake word detection that recognizes phrases with just a few us... | 393 | |
| Speechless repo for sales call analysis | 384 | |
| The TTSDS benchmark evaluates synthetic speech quality by considering prosody, s... | 370 | |
| Razel Python CLI (Typer): installable command-line tool | 367 | |
| Voice I/O MCP server for Claude Code — speak and listen through your mic and spe... | 365 | |
| Python input audio. | 347 | |
| Unofficial wespeaker pypi package | 343 | |
| macOS menu bar app that captures voice, transcribes, and enhances text with AI | 329 | |
| A local/offline-capable voice assistant with speech recognition, LLM processing,... | 329 | |
| audiotool is a DeepLearning utility library. | 325 | |
| Bilingual audiobook interleaver. | 319 | |
| Real-time audio transcription for video streaming with Firefox browser integrati... | 318 | |
| Smart video cut point detection for AI-generated talking head videos using multi... | 304 | |
| Travel Vlog Automation System - Automate vlog ingestion, junk detection, and DaV... | 254 | |
| HumAwareVAD: A optimized voice activity detection model to better distinguish hu... | 247 | |
| Cascade is a production-ready, high-performance, and low-latency audio stream pr... | 229 | |
| A wake word listener for Rhasspy | 205 | |
| Lite wrapper for the useful-moonshine speech to text models | 188 | |
| A YouTube MCP Server for video information and transcription | 172 | |
| Local Voice Transcription System - Privacy-first, model-agnostic speech-to-text | 149 | |
| Miscellaneous utilities for helping with audio transcription. | 147 | |
| JaNet diarization package | 120 | |
| A package for computing DS-WED. | 118 | |
| GOOBITS STT - Pure speech-to-text engine with multiple operation modes | 112 | |
| VietTTS: An Open-Source Vietnamese Text to Speech | 112 |