83 dependents
Package Description Downloads/month
Contrastive Language-Audio Pretraining 175K
Generative models for conditional audio generation 118K
High-Quality Voice Cloning TTS for 600+ Languages 103K
Megatron's multi-modal data loader 77K
Easily turn large sets of image urls to an image dataset. Can download, resize a... 47K
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural netw... 15K
(ML) audio engineering i/o utils 9K
Tune image classifier using Ray Tune + SkyPilot 7K
Easily compute clip embeddings and build a clip retrieval system with them 6K
6K
Provide ready-to-use dataloader for deep learning models 5K
An open-source computer vision framework for wildlife image analysis, featuring ... 4K
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognitio... 3K
Pretrained Unet image to image model for dtd alpha layer masked image to graysca... 3K
Training utility library and config manager for Granular Machine Vision research 3K
Self-contained distributed community captioning system 3K
A loose federation of distributed, typed datasets 2K
Unlearning Algorithms 2K
A PyTorch library for multi-modal image translation with diffusion bridges, GANs... 2K
A modular high-level library to train embodied AI agents across a variety of tas... 2K
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of... 2K
A package for audio transcription and speaker diarization using Whisper and NeMo... 2K
NeMo Export and Deploy - a library to export and deploy LLMs and MMs 1K
An ML package for GStreamer 1K
Cosmos-Predict2 is a collection of general-purpose world foundation models for P... 1K
A pipeline for curating and sanitizing large-scale image datasets. 1K
Physics-guided flow models for weather prediction with flow matching 909
Audio text dataset for PyTorch training based on webdataset. 841
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT ... 608
A High Performance Patching Library for Digital Pathology AI dataset creation. 591
eXtensive Audio Representation and Evaluation Suite 576
Classifies galaxy morphology with Bayesian CNN 559
Aria training and evaluation kits 557
A lightweight framework for evaluating visual-language models. 557
Training tools and scripts for FluxFlow text-to-image generation 514
Tools and utilities for the HoHo Dataset and S23DR Competition 474
A PyTorch toolkit for audio synthesis. 447
PDF processing pipeline: remove headers/footers, convert to markdown, and genera... 410
Tool for creating patches from geo-referenced and non geo-referenced image and l... 395
Distributed preprocessing for deep learning. 395
OneScience是基于先进的深度学习框架打造的科学计算工具包,旨在通过一系列高度集成的组件加速科学研究和技术开发进程。 388
一款将 PDF 到 Word 转换工具,a PDF to Word conversion tool 381
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understa... 373
Generative models for conditional audio generation 371
Ichigo Whisper is a compact (22M parameters), open-source speech tokenizer for t... 370
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching 364
A modular ML framework for training and evaluation tasks 362
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of Deep... 359
[CVPR2024 Highlight] VBench - We Evaluate Video Generation 344
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for g... 310