A python package to build AI-powered real-time audio applications
Speaker diarization for Python — "who spoke when?" CPU-only, no API keys, Apache 2.0. ~10.8% DER on VoxConverse, 8x faster than real-time.
On-device speaker recognition engine powered by deep learning
Luigi pipeline to download VoxCeleb(2) audio from YouTube and extract speaker segments