Speech Recognition Python Packages

transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

144.1M 160K 33K

speechrecognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.

7.8M 9K 2K

faster-whisper

Faster Whisper transcription with CTranslate2

7.6M 23K 2K

deepgram-sdk

Official Python SDK for Deepgram.

2.2M 424 127

openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

1.4M 10K 3K

whisperx

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

1.1M 22K 2K

kaldiio

A pure python module for reading and writing kaldi ark files

853K 268 36

vosk

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

476K 15K 2K

funasr

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

363K 16K 2K

cn2an

📦 快速转化「中文数字」和「阿拉伯数字」～ (最新特性：分数，日期、温度等转化）

273K 759 82

rev-ai

Rev AI Python SDK

262K 36 13

openvino-dev

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

172K 10K 3K

pocketsphinx

A small speech recognizer

119K 4K 729

pvporcupine

On-device wake word detection powered by deep learning

119K 5K 573

mlx-audio

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

87K 7K 578

pytorch-pretrained-bert

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

81K 160K 33K

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

54K 3K 210

speechmatics-python

Python library and CLI for Speechmatics

51K 75 23

pytorch-transformers-pvt-nightly

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

47K 160K 33K

whisper-ctranslate2

Whisper command line client compatible with original OpenAI client based on CTranslate2.

38K 1K 124

pytorch-transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

31K 160K 33K

espnet

End-to-End Speech Processing Toolkit

29K 10K 2K

deepspeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

28K 27K 4K

onnx-asr

A lightweight Python package for Automatic Speech Recognition using ONNX models

22K 311 30