silero-vad
Very fast, accurate speaker diarization
Real-time speech-to-text clipboard tool with Silero VAD and local ASR support
Production-grade Traditional Chinese / Taiwan Mandarin speech-to-text. Qwen3-ASR + MediaTek Breeze-ASR-25, hot-word injection, LLM polish, speaker diarization. RTF up to 1554x on RTX 5090, 56 TDD tests.
Real-time speech-to-text translation over WebSocket. Streams Opus or raw PCM audio from client to server for live transcription and optional translation. Supports CLI and Python API.