1,099 dependents
| Package | Description | Downloads/month |
|---|---|---|
| SGLang is a high-performance serving framework for large language models and mul... | 298.9M | |
| A high-throughput and memory-efficient inference and serving engine for LLMs | 9.2M | |
| State-of-the-art speaker diarization toolkit | 2.2M | |
| Fast audio data augmentation in PyTorch. Inspired by audiomentations. Useful for... | 1.7M | |
| Pitch-shift audio clips quickly with PyTorch (CUDA supported)! Additional utilit... | 1.7M | |
| All-in-one speech toolkit in pure Python and Pytorch | 1.6M | |
| WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarizatio... | 1.1M | |
| Silero VAD: pre-trained enterprise-grade Voice Activity Detector | 857K | |
| Open-Unmix - Music Source Separation for PyTorch | 496K | |
| MatterSim: A deep learning atomistic model across elements, temperatures and pre... | 423K | |
| Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) propos... | 292K | |
| Pytorch implementation of the CREPE pitch tracker | 265K | |
| A robust, efficient, low-latency speech-to-text library with advanced voice acti... | 218K | |
| Qwen-TTS python package | 212K | |
| 🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and p... | 204K | |
| Open Audio Watermarking Tool | 158K | |
| The official Pytorch implementation of Fast Context-based Pitch Estimation (FCPE... | 139K | |
| VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice D... | 125K | |
| Audiocraft is a library for audio processing and generation with deep learning. ... | 122K | |
| Generative models for conditional audio generation | 117K | |
| Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech wi... | 106K | |
| High-Quality Voice Cloning TTS for 600+ Languages | 105K | |
| SoTA open-source TTS | 98K | |
| Deep learning software to decode EEG, ECG or MEG signals | 66K | |
| Transcription, forced alignment, and audio indexing with OpenAI's Whisper | 54K | |
| An immersion toolkit for learning Languages through games and other visual media... | 49K | |
| Using RVC via console or python scripts | 38K | |
| Unified automatic quality assessment for speech, music, and sound. | 37K | |
| ⚡ A seamless integration of HuggingFace Transformers & Diffusers with RBLN SDK f... | 30K | |
| Wheels & Docker images for running vLLM on CPU-only systems, optimized for diffe... | 30K | |
| Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024) | 29K | |
| Pitch Estimating Neural Networks (PENN) | 26K | |
| endoreg-db | 25K | |
| A package for NeuCodec, based on xcodec2. | 24K | |
| A nearly-live implementation of OpenAI's Whisper. | 20K | |
| A simple FastAPI Server to run XTTSv2 | 20K | |
| These modules form the backbone "bare-metal" version of the Naeural Edge Protoco... | 20K | |
| Generalizable Perception Stack for all things 3D, 4D & Scene Understanding | 18K | |
| AI-based Audio Watermarking Tool | 18K | |
| Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation... | 18K | |
| CLAP (Contrastive Language-Audio Pretraining) is a model that learns acoustic co... | 17K | |
| Offline inference engine for art, real-time voice conversations, LLM powered cha... | 14K | |
| Accurate and general beat tracker | 14K | |
| senselab is a Python package that simplifies building pipelines for biometric (e... | 13K | |
| Toward High-Accuracy Open-Source Biomolecular Structure Prediction. | 13K | |
| Python implementation of an ASR service | 12K | |
| Text-Acoustic Dual-Aligned Language Model | 12K | |
| so-vits-svc fork with realtime support, improved interface and more features. | 11K | |
| AI-driven Beamline Controller | 10K | |
| Python Speech Language Sample Analysis | 10K |