30 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Generative models for conditional audio generation | 118K | |
| Build high-performance AI models with modular building blocks | 19K | |
| 🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice ... | 3K | |
| Video Diffusion - Pytorch | 2K | |
| 🐍 🤖 Pip installable package for StyleTTS 2 human-level text-to-speech and voice ... | 2K | |
| [ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents | 1K | |
| Implementation of Parti, Google's pure attention-based text-to-image neural netw... | 1K | |
| [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level c... | 815 | |
| This repository provides example code for a Docker container for the UNICORN Cha... | 766 | |
| Evaluating Text-to-Visual Generation with Image-to-Text Generation. | 704 | |
| Audio generation using diffusion models, in PyTorch. | 530 | |
| Dora Node for VLM | 451 | |
| Generative models for conditional audio generation | 371 | |
| Transformers at zeta scales | 369 | |
| 🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of Deep... | 359 | |
| [CVPR2024 Highlight] VBench - We Evaluate Video Generation | 344 | |
| [NeurIPS 2025] PyTorch implementation of [ThinkSound], a unified framework for g... | 310 | |
| THUNDER: Tile-level Histopathology image UNDERstanding benchmark | 273 | |
| 260 | ||
| Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-l... | 238 | |
| Fork of StyleTTS 2 Python packge. StyleTTS 2: Towards Human-Level Text-to-Speech... | 222 | |
| Embedding of whole slide images with PRISM | 187 | |
| Modified llava 1.1.3 for apiprompting. | 182 | |
| Package for an easy implementation of paper "Attention Prompting on Image for La... | 161 | |
| A text to speech plugin based on StyleTTS2. | 146 | |
| Unofficial pip package for StyleTTS 2 | 107 | |
| Chiluka - A lightweight TTS inference package based on StyleTTS2 | 99 | |
| A document analysis program | 96 | |
| All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 202... | 80 | |
| A package for preprocessing whole-slide images. | 50 |