83 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Contrastive Language-Audio Pretraining | 175K | |
| Generative models for conditional audio generation | 118K | |
| High-Quality Voice Cloning TTS for 600+ Languages | 103K | |
| Megatron's multi-modal data loader | 77K | |
| Easily turn large sets of image urls to an image dataset. Can download, resize a... | 47K | |
| Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural netw... | 15K | |
| (ML) audio engineering i/o utils | 9K | |
| Tune image classifier using Ray Tune + SkyPilot | 7K | |
| Easily compute clip embeddings and build a clip retrieval system with them | 6K | |
| 6K | ||
| Provide ready-to-use dataloader for deep learning models | 5K | |
| An open-source computer vision framework for wildlife image analysis, featuring ... | 4K | |
| UniMERNet: A Universal Network for Real-World Mathematical Expression Recognitio... | 3K | |
| Pretrained Unet image to image model for dtd alpha layer masked image to graysca... | 3K | |
| Training utility library and config manager for Granular Machine Vision research | 3K | |
| Self-contained distributed community captioning system | 3K | |
| A loose federation of distributed, typed datasets | 2K | |
| Unlearning Algorithms | 2K | |
| A PyTorch library for multi-modal image translation with diffusion bridges, GANs... | 2K | |
| A modular high-level library to train embodied AI agents across a variety of tas... | 2K | |
| Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of... | 2K | |
| A package for audio transcription and speaker diarization using Whisper and NeMo... | 2K | |
| NeMo Export and Deploy - a library to export and deploy LLMs and MMs | 1K | |
| An ML package for GStreamer | 1K | |
| Cosmos-Predict2 is a collection of general-purpose world foundation models for P... | 1K | |
| A pipeline for curating and sanitizing large-scale image datasets. | 1K | |
| Physics-guided flow models for weather prediction with flow matching | 909 | |
| Audio text dataset for PyTorch training based on webdataset. | 841 | |
| 🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT ... | 608 | |
| A High Performance Patching Library for Digital Pathology AI dataset creation. | 591 | |
| eXtensive Audio Representation and Evaluation Suite | 576 | |
| Classifies galaxy morphology with Bayesian CNN | 559 | |
| Aria training and evaluation kits | 557 | |
| A lightweight framework for evaluating visual-language models. | 557 | |
| Training tools and scripts for FluxFlow text-to-image generation | 514 | |
| Tools and utilities for the HoHo Dataset and S23DR Competition | 474 | |
| A PyTorch toolkit for audio synthesis. | 447 | |
| PDF processing pipeline: remove headers/footers, convert to markdown, and genera... | 410 | |
| Tool for creating patches from geo-referenced and non geo-referenced image and l... | 395 | |
| Distributed preprocessing for deep learning. | 395 | |
| OneScience是基于先进的深度学习框架打造的科学计算工具包,旨在通过一系列高度集成的组件加速科学研究和技术开发进程。 | 388 | |
| 一款将 PDF 到 Word 转换工具,a PDF to Word conversion tool | 381 | |
| [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understa... | 373 | |
| Generative models for conditional audio generation | 371 | |
| Ichigo Whisper is a compact (22M parameters), open-source speech tokenizer for t... | 370 | |
| ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching | 364 | |
| A modular ML framework for training and evaluation tasks | 362 | |
| 🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of Deep... | 359 | |
| [CVPR2024 Highlight] VBench - We Evaluate Video Generation | 344 | |
| [NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for g... | 310 |