Speech Processing Python Packages

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

872K 9K 768

torchscale

Foundation Architecture for (M)LLMs

80K 3K 225

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

49K 3K 210

indic-num2words

Python library for converting numbers to words for all Indian Languages.

23K 36 14

spafe

:sound: spafe: Simplified Python Audio Features Extraction

16K 483 78

pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

13K 451 80

swift-f0

Fast and accurate fundamental frequency (F0) detector using convolutional neural networks

10K 154 20

resemble-enhance

AI powered speech denoising and enhancement

6K 2K 273

silero-vad-lite

Lightweight wrapper for Silero VAD using internal ONNX Runtime and with no python package dependencies

5K 16 1

voicefixer

General Speech Restoration

4K 1K 157

diarize

Speaker diarization for Python — "who spoke when?" CPU-only, no API keys, Apache 2.0. ~10.8% DER on VoxConverse, 8x faster than real-time.

4K 62 7

nnmnkwii

Library to build speech synthesis systems designed for easy and fast prototyping.

3K 399 71

vak

A neural network framework for researchers studying acoustic communication

2K 91 17

bournemouth-forced-aligner

Extract phoneme-level timestamps from speeh audio.

2K 130 14

deepaudio-x

A python library to train Deep Neural Networks on various audio tasks using Self-Supervised backbones.

2K 27 0

signal-transformation

Widely used signal transformation using TensorFlow API.

2K 1 0

everyvoice

The EveryVoice TTS Toolkit - Text To Speech for your language

2K 43 4

polyglotdb

PolyglotDB is a package for phonetic corpus storage and analysis

1K 51 17

lfeats

A unified interface to extract hidden representations from speech foundation models

1K 1 0

vistec-ser

Speech Emotion Recognition using PyTorch sponsored by AIS and VISTEC-DEPA AIResearch Institute Thailand.

930 3 2

fastwer

A PyPI package for fast word/character error rate (WER/CER) calculation

839 70 16

scoreq

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

806 108 8

stark-engine

S.T.A.R.K - Speech and Text Algorithmic Recognition Kit. Modern framework for creating powerfull voice assistants.

736 63 4

zpe-prosody

728 1 0

Search Packages