Dependents of webdataset

83 dependents

Package	Description	Downloads/month
laion-clap	Contrastive Language-Audio Pretraining	175K
stable-audio-tools	Generative models for conditional audio generation	118K
omnivoice	High-Quality Voice Cloning TTS for 600+ Languages	103K
megatron-energon	Megatron's multi-modal data loader	77K
img2dataset	Easily turn large sets of image urls to an image dataset. Can download, resize a...	47K
dalle2-pytorch	Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural netw...	15K
aeiou	(ML) audio engineering i/o utils	9K
krunic	Tune image classifier using Ray Tune + SkyPilot	7K
clip-retrieval	Easily compute clip embeddings and build a clip retrieval system with them	6K
sapiens-transformers		6K
seqchromloader	Provide ready-to-use dataloader for deep learning models	5K
birder	An open-source computer vision framework for wildlife image analysis, featuring ...	4K
unimernet	UniMERNet: A Universal Network for Real-World Mathematical Expression Recognitio...	3K
unet-masked2mask	Pretrained Unet image to image model for dtd alpha layer masked image to graysca...	3K
phobos	Training utility library and config manager for Granular Machine Vision research	3K
caption-flow	Self-contained distributed community captioning system	3K
atdata	A loose federation of distributed, typed datasets	2K
unlearn-diff	Unlearning Algorithms	2K
pytorch-image-translation-models	A PyTorch library for multi-modal image translation with diffusion bridges, GANs...	2K
habitat-baselines	A modular high-level library to train embodied AI agents across a variety of tas...	2K
hpsv2x	Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of...	2K
fast-whisper-diarizer	A package for audio transcription and speaker diarization using Whisper and NeMo...	2K
nemo-export-deploy	NeMo Export and Deploy - a library to export and deploy LLMs and MMs	1K
gst-python-ml	An ML package for GStreamer	1K
cosmos-predict2	Cosmos-Predict2 is a collection of general-purpose world foundation models for P...	1K
vision-data-curation	A pipeline for curating and sanitizing large-scale image datasets.	1K
weatherflow	Physics-guided flow models for weather prediction with flow matching	909
atdataset	Audio text dataset for PyTorch training based on webdataset.	841
iara-stt-training	🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT ...	608
wsi-patching	A High Performance Patching Library for Digital Pathology AI dataset creation.	591
xares	eXtensive Audio Representation and Evaluation Suite	576
zoobot	Classifies galaxy morphology with Bayesian CNN	559
projectaria-atek	Aria training and evaluation kits	557
eval-mm	A lightweight framework for evaluating visual-language models.	557
fluxflow-training	Training tools and scripts for FluxFlow text-to-image generation	514
hoho2025	Tools and utilities for the HoHo Dataset and S23DR Competition	474
audyn	A PyTorch toolkit for audio synthesis.	447
multimodal-parsers	PDF processing pipeline: remove headers/footers, convert to markdown, and genera...	410
geotiff-tiler	Tool for creating patches from geo-referenced and non geo-referenced image and l...	395
tensorcom	Distributed preprocessing for deep learning.	395
onescience	OneScience是基于先进的深度学习框架打造的科学计算工具包，旨在通过一系列高度集成的组件加速科学研究和技术开发进程。	388
freep2w	一款将 PDF 到 Word 转换工具，a PDF to Word conversion tool	381
moviechat	[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understa...	373
harmonai-tools	Generative models for conditional audio generation	371
ichigo-whisper	Ichigo Whisper is a compact (22M parameters), open-source speech tokenizer for t...	370
zipvoice	ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching	364
xflow-py	A modular ML framework for training and evaluation tasks	362
otter-ai	🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of Deep...	359
vbench2	[CVPR2024 Highlight] VBench - We Evaluate Video Generation	344
thinksound	[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for g...	310