Speech Python Packages

datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

120.3M 21K 3K

torchaudio

Data manipulation and transformation for audio signal processing, powered by PyTorch

12.5M 3K 770

modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

4.1M 9K 934

gtts

Python library and CLI tool to interface with Google Translate's text-to-speech API

1.5M 3K 383

whisperx

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

1.1M 22K 2K

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

830K 9K 768

tts

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

204K 45K 6K

monotonic-alignment-search

Monotonically align text and speech

195K 4 1

silero

Silero Models: pre-trained text-to-speech models made embarrassingly simple

178K 6K 363

voxcpm

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

128K 17K 2K

hume

Python client for Hume AI

119K 174 44

deepfilternet

Noise supression using deep filtering

62K 4K 443

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

54K 3K 210

penn

Pitch Estimating Neural Networks (PENN)

26K 273 26

senselab

senselab is a Python package that simplifies building pipelines for biometric (e.g. speech, voice, video, etc) analysis.

14K 38 9

achatbot

An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.

14K 89 18

pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

13K 451 80

sinapsis

Modular and Universal AI

10K 40 11

allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

9K 727 101

nkululeko

Machine learning audio prediction experiments based on templates

9K 43 12

pycodec2

Python's interface to codec 2

9K 24 8

nlp

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

8K 21K 3K

clearvoice

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

8K 4K 337

inaspeechsegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

8K 886 149