30 dependents
Package Description Downloads/month
Generative models for conditional audio generation 118K
Build high-performance AI models with modular building blocks 19K
🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice ... 3K
Video Diffusion - Pytorch 2K
🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice ... 2K
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents 1K
Implementation of Parti, Google's pure attention-based text-to-image neural netw... 1K
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level c... 815
This repository provides example code for a Docker container for the UNICORN Cha... 766
Evaluating Text-to-Visual Generation with Image-to-Text Generation. 704
Audio generation using diffusion models, in PyTorch. 530
Dora Node for VLM 451
Generative models for conditional audio generation 371
Transformers at zeta scales 369
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of Deep... 359
[CVPR2024 Highlight] VBench - We Evaluate Video Generation 344
[NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for g... 310
THUNDER: Tile-level Histopathology image UNDERstanding benchmark 273
260
Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-l... 238
Fork of StyleTTS 2 Python packge. StyleTTS 2: Towards Human-Level Text-to-Speech... 222
Embedding of whole slide images with PRISM 187
Modified llava 1.1.3 for apiprompting. 182
Package for an easy implementation of paper "Attention Prompting on Image for La... 161
A text to speech plugin based on StyleTTS2. 146
Unofficial pip package for StyleTTS 2 107
Chiluka - A lightweight TTS inference package based on StyleTTS2 99
A document analysis program 96
All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 202... 80
A package for preprocessing whole-slide images. 50