99 dependents
| Package | Description | Downloads/month |
|---|---|---|
| 🤗 AutoTrain Advanced | 34K | |
| Generate karaoke videos, by downloading audio and lyrics, separating instrumenta... | 31K | |
| Framework for building deep neural network models for sound, speech, and voice A... | 7K | |
| An open-source evaluation framework for voice agents | 6K | |
| 6K | ||
| zero-shot voice conversion & singing voice conversion, with real-time support | 5K | |
| ASR text preprocessing utility | 4K | |
| Language understanding toolkit for human dialogs. | 4K | |
| Advanced Machine Learning Training Platform - IN DEVELOPMENT | 3K | |
| Automatically create synchronised lyrics files in ASS and LRC with word-level ti... | 3K | |
| Evals is a framework for evaluating LLMs and LLM systems, and an open-source reg... | 3K | |
| Anaouder mouezh e Brezhoneg gant Vosk | 2K | |
| Evaluation of Docling | 2K | |
| Scorer for the HIPE-OCRepair evaluation campaign. | 2K | |
| De novo peptide sequencing with InstaNovo: Accurate, database-free peptide ident... | 2K | |
| A package for audio transcription and speaker diarization using Whisper and NeMo... | 2K | |
| jiwer-compatible WER normalizer with number, email, URL, filler, and symbol norm... | 2K | |
| A Python package for aggregating and processing RSS feeds with LLM-enhanced cont... | 1K | |
| ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription | 1K | |
| VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Spe... | 1K | |
| Utility for project ratchada speech to text | 1K | |
| Common python code used in out backend services. | 1K | |
| This library is designed to augment audio data for machine learning purposes. It... | 1K | |
| An evaluation framework for Serbian Whisper models. | 1K | |
| Utilities for 'Hands-On Generative AI with Transformers and Diffusion Models' (u... | 1K | |
| SLTev is a tool for comprehensive evaluation of (simultaneous) spoken language t... | 982 | |
| Add language model support to HF Transformers' Whisper models | 980 | |
| WhisperFlow: Real-Time Transcription Powered by OpenAI Whisper | 878 | |
| Amazon Foundation Model Evaluations | 838 | |
| Automatic lyrics transcription evaluation toolkit | 832 | |
| Phoneme Aligner | 805 | |
| 🏆 Run benchmarks against the most common ASR tools on the market. | 798 | |
| Dictionary-Based, Variety-Aware Lemmatizer for Romansh | 790 | |
| Evaluate chaptering quality for audio and video content in time space, supportin... | 765 | |
| Library to score audio medical case studies | 727 | |
| VAD-driven streaming voice dictation for macOS — local Whisper ASR + Silero VAD ... | 683 | |
| HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools | 623 | |
| Automatic Speech Analysis for Cognitive Assessment | 599 | |
| HTRflow is the underlying engine for our HTR-pipeline | 598 | |
| eXtensive Audio Representation and Evaluation Suite | 576 | |
| A library to perform automatic speech recognition with huggingface transformers. | 531 | |
| A benchmarking and analysis framework for Russian ASR models | 490 | |
| SubER - Subtitle Edit Rate | 489 | |
| Ichigo is an open, ongoing research experiment to extend a text-based LLM to hav... | 475 | |
| Voice conversion toolkit based on S3PRL: Self-Supervised Speech/Sound Pre-traini... | 473 | |
| Add your description here | 426 | |
| PDF processing pipeline: remove headers/footers, convert to markdown, and genera... | 410 | |
| Effective evaluations for Text-to-Speech (TTS) systems | 390 | |
| A simple, low-dependency package for aligning sentences by minimizing a chosen m... | 382 | |
| TFM | 375 |