54 dependents
Package Description Downloads/month
An immersion toolkit for learning Languages through games and other visual media... 49K
Tools for building wake-word and speech-command datasets and models. 7K
Fish Speech 6K
Very fast, accurate speaker diarization 4K
Speaker diarization for Python — "who spoke when?" CPU-only, no API keys, Apache... 4K
🥰 Building AI-based conversational avatars lightning fast ⚡️💬 3K
Lightweight offline voice assistant for hands-free music control (YouTube Music ... 3K
Use your voice to trigger events and communicate with AI Agents. 3K
high quality multi-lingual speech to text 2K
Speech recognition with accurate word-level timestamps. 2K
VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Spe... 1K
1K
Forced alignment pipeline designed for efficiency and ease of use. 1K
An ASR API server for FunASR 1K
Dora Node for Text translating using Argostranslate 1K
Python toolkit for the Qwen3-ASR API—parallel high‑throughput calls, robust long... 1K
Useful audio tools 959
Normalize WAV voice recordings to a consistent target dB level using AGC, VAD, a... 841
Audio ZEN is a library for audio/speech signal processing. 809
Generic Speech AI Platform - Ollama for Voice Models 645
Package to run ABC-MRT16 intelligibility tests 610
Speechless repo for sales call analysis 440
Utility library for singing voice conversion work 426
Meeting recorder CLI that transcribes and generates AI notes 423
Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD co... 414
Lightweight local wake word detection that recognizes phrases with just a few us... 393
Speechless repo for sales call analysis 384
The TTSDS benchmark evaluates synthetic speech quality by considering prosody, s... 370
Razel Python CLI (Typer): installable command-line tool 367
Voice I/O MCP server for Claude Code — speak and listen through your mic and spe... 365
Python input audio. 347
Unofficial wespeaker pypi package 343
macOS menu bar app that captures voice, transcribes, and enhances text with AI 329
A local/offline-capable voice assistant with speech recognition, LLM processing,... 329
audiotool is a DeepLearning utility library. 325
Bilingual audiobook interleaver. 319
Real-time audio transcription for video streaming with Firefox browser integrati... 318
Smart video cut point detection for AI-generated talking head videos using multi... 304
Travel Vlog Automation System - Automate vlog ingestion, junk detection, and DaV... 254
HumAwareVAD: A optimized voice activity detection model to better distinguish hu... 247
Cascade is a production-ready, high-performance, and low-latency audio stream pr... 229
A wake word listener for Rhasspy 205
Lite wrapper for the useful-moonshine speech to text models 188
A YouTube MCP Server for video information and transcription 172
Local Voice Transcription System - Privacy-first, model-agnostic speech-to-text 149
Miscellaneous utilities for helping with audio transcription. 147
JaNet diarization package 120
A package for computing DS-WED. 118
GOOBITS STT - Pure speech-to-text engine with multiple operation modes 112
VietTTS: An Open-Source Vietnamese Text to Speech 112