Speech Python Packages

datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

116.3M 21K 3K

torchaudio

Data manipulation and transformation for audio signal processing, powered by PyTorch

12.3M 3K 770

modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

4M 9K 934

gtts

Python library and CLI tool to interface with Google Translate's text-to-speech API

1.4M 3K 383

whisperx

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

1.1M 22K 2K

silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

872K 9K 768

monotonic-alignment-search

Monotonically align text and speech

202K 4 1

tts

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

202K 45K 6K

silero

Silero Models: pre-trained text-to-speech models made embarrassingly simple

172K 6K 363

hume

Python client for Hume AI

124K 174 44

voxcpm

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

122K 17K 2K

deepfilternet

Noise supression using deep filtering

63K 4K 443

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

49K 3K 210

penn

Pitch Estimating Neural Networks (PENN)

26K 273 26

achatbot

An open source chat bot architecture for voice/vision (and multimodal) assistants, local(CPU/GPU bound) and remote(I/O bound) to run.

14K 89 18

senselab

senselab is a Python package that simplifies building pipelines for biometric (e.g. speech, voice, video, etc) analysis.

13K 38 9

pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

13K 451 80

sinapsis

Modular and Universal AI

10K 40 11

allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages

10K 727 101

nkululeko

Machine learning audio prediction experiments based on templates

9K 43 12

pycodec2

Python's interface to codec 2

9K 24 8

inaspeechsegmenter

CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.

8K 886 149

clearvoice

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

8K 4K 337

nlp

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

8K 21K 3K

Search Packages