Voice Activity Detection Python Packages

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

872K 9K 768

funasr

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

357K 16K 2K

auditok

An audio/acoustic activity detection and audio segmentation tool

27K 845 100

sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

17K 2K 213

omnivad

OmniVAD — Cross-platform Voice Activity Detection and Audio Event Detection (based on FireRedVAD)

14K 17 1

ffsubsync

Automagically synchronize subtitles with video.

13K 8K 315

inaspeechsegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

8K 886 149

funasr-onnx

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

7K 16K 2K

diart

A python package to build AI-powered real-time audio applications

6K 2K 161

silero-vad-lite

Lightweight wrapper for Silero VAD using internal ONNX Runtime and with no python package dependencies

5K 16 1

whisper-s2t

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

5K 568 76

diarize

Speaker diarization for Python — "who spoke when?" CPU-only, no API keys, Apache 2.0. ~10.8% DER on VoxConverse, 8x faster than real-time.

4K 62 7

subaligner

Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/

3K 506 24

fireredvad

A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD

3K 367 26

pvcobra

On-device voice activity detection (VAD) powered by deep learning

2K 251 17

ffvoice

🎙️ 高性能 C++ 语音引擎 - 实时音频处理 + AI 语音识别 + 边录边转写 | High-performance C++ voice engine with real-time ASR and RNNoise

1K 2 0

pvcobrademo

On-device voice activity detection (VAD) powered by deep learning

948 251 17

livekit-plugins-tenvad

TEN VAD low-latency voice activity detection for real-time streaming, integrated with livekit-agents

813 24 6

py-nltools

A collection of basic python modules for spoken natural language processing

774 55 15

funasr-torch

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

718 16K 2K

sherpa-ncnn-core

660 2K 213

sherpa-ncnn-bin

625 2K 213

spectra-torch

Spectra extraction tutorials based on torch and torchaudio.

476 41 4

open-voice-activity-detection

Fully open-source and state-of-the-art Voice Activity Detection (VAD) models for academic research and commercial applications.

416 7 0

Search Packages