70 dependents
| Package | Description | Downloads/month |
|---|---|---|
| The repository provides code for running inference and finetuning with the Meta ... | 44K | |
| endoreg-db | 25K | |
| These modules form the backbone "bare-metal" version of the Naeural Edge Protoco... | 19K | |
| OpenSportsLib is the professional library, designed for advanced video understan... | 7K | |
| Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™ | 5K | |
| Advanced Vision-Language Model Engine for content tagging | 5K | |
| Modular media quality metrics toolkit. | 3K | |
| Unified utilities package for MiniCPM-o: includes stepaudio2 and extensible util... | 2K | |
| PaddleFormers is an easy-to-use library of pre-trained large language model zoo ... | 2K | |
| 2K | ||
| A PyTorch library for multi-modal image translation with diffusion bridges, GANs... | 2K | |
| A TUI video player. | 2K | |
| NeMo Export and Deploy - a library to export and deploy LLMs and MMs | 1K | |
| [CVPR2024 Highlight] VBench - We Evaluate Video Generation | 1K | |
| VideoSeal: Video watermarking library by Facebook AI Research. | 1K | |
| Psychological and Social Interactions Feature Extraction | 1K | |
| FERAL: Feature Extraction for Recognition of Animal Locomotion | 1K | |
| Cosmos-Predict2 is a collection of general-purpose world foundation models for P... | 1K | |
| PegasusX: The Future of Multimodal Embeddings 🦄 🦄 | 909 | |
| SoccerNetPro is the professional extension of the popular SoccerNet library, des... | 847 | |
| Multimodal SDK | 809 | |
| Tool for annotating time series data from sources such as IMUs, MoCap, Videos an... | 798 | |
| Awesome video understanding toolkits based on PaddlePaddle. It supports video da... | 772 | |
| 759 | ||
| 716 | ||
| Evaluating Text-to-Visual Generation with Image-to-Text Generation. | 704 | |
| DWPose component from ControlNeXt for whole-body pose estimation | 497 | |
| PyTorch dataset which uses PySceneDetect to split videos into scenes | 454 | |
| [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understa... | 373 | |
| 345 | ||
| [CVPR2024 Highlight] VBench - We Evaluate Video Generation | 344 | |
| [ICML26 Spotlight] UniPercept: Towards Unified Perceptual-Level Image Understand... | 334 | |
| This repo is a fork of the original VLM2Vec repo, modified for easy Pyserini int... | 326 | |
| LAVIS - A One-stop Library for Language-Vision Intelligence | 309 | |
| OpenCompass VLM Evaluation Kit - packaged by NVIDIA | 286 | |
| GigaDatasets: A Unified and Lightweight Framework for Data Processing, Curation,... | 274 | |
| Moedified from the official PyTorch codebase for the video joint-embedding predi... | 272 | |
| Interactive data visualization for signals, plots, and videos. | 272 | |
| LAVIS - A One-stop Library for Language-Vision Intelligence | 229 | |
| JEPA research code. | 220 | |
| LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 20... | 208 | |
| The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade develo... | 202 | |
| Unified utilities package for MiniCPM-o: includes cosyvoice + stepaudio2 and ext... | 189 | |
| AskVideos-VideoCLIP model | 188 | |
| Finetrainers is a work-in-progress library to support (accessible) training of d... | 175 | |
| 149 | ||
| A CLI tool designed to find and handle duplicate files | 127 | |
| Light, Efficient, Omni-modal & Reward-model Driven Reinforcement Fine-Tuning Fra... | 120 | |
| State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More! | 119 | |
| a Video Quality Analysis Toolkit | 118 |