70 dependents
Package Description Downloads/month
The repository provides code for running inference and finetuning with the Meta ... 44K
endoreg-db 25K
These modules form the backbone "bare-metal" version of the Naeural Edge Protoco... 19K
OpenSportsLib is the professional library, designed for advanced video understan... 7K
open-edge-platform otx
Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™ 5K
Advanced Vision-Language Model Engine for content tagging 5K
Modular media quality metrics toolkit. 3K
Unified utilities package for MiniCPM-o: includes stepaudio2 and extensible util... 2K
PaddleFormers is an easy-to-use library of pre-trained large language model zoo ... 2K
2K
A PyTorch library for multi-modal image translation with diffusion bridges, GANs... 2K
A TUI video player. 2K
NeMo Export and Deploy - a library to export and deploy LLMs and MMs 1K
[CVPR2024 Highlight] VBench - We Evaluate Video Generation 1K
VideoSeal: Video watermarking library by Facebook AI Research. 1K
Psychological and Social Interactions Feature Extraction 1K
FERAL: Feature Extraction for Recognition of Animal Locomotion 1K
Cosmos-Predict2 is a collection of general-purpose world foundation models for P... 1K
PegasusX: The Future of Multimodal Embeddings 🦄 🦄 909
SoccerNetPro is the professional extension of the popular SoccerNet library, des... 847
Multimodal SDK 809
Tool for annotating time series data from sources such as IMUs, MoCap, Videos an... 798
Awesome video understanding toolkits based on PaddlePaddle. It supports video da... 772
759
716
Evaluating Text-to-Visual Generation with Image-to-Text Generation. 704
DWPose component from ControlNeXt for whole-body pose estimation 497
PyTorch dataset which uses PySceneDetect to split videos into scenes 454
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understa... 373
345
[CVPR2024 Highlight] VBench - We Evaluate Video Generation 344
[ICML26 Spotlight] UniPercept: Towards Unified Perceptual-Level Image Understand... 334
This repo is a fork of the original VLM2Vec repo, modified for easy Pyserini int... 326
LAVIS - A One-stop Library for Language-Vision Intelligence 309
OpenCompass VLM Evaluation Kit - packaged by NVIDIA 286
GigaDatasets: A Unified and Lightweight Framework for Data Processing, Curation,... 274
Moedified from the official PyTorch codebase for the video joint-embedding predi... 272
Interactive data visualization for signals, plots, and videos. 272
LAVIS - A One-stop Library for Language-Vision Intelligence 229
JEPA research code. 220
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 20... 208
The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade develo... 202
Unified utilities package for MiniCPM-o: includes cosyvoice + stepaudio2 and ext... 189
AskVideos-VideoCLIP model 188
Finetrainers is a work-in-progress library to support (accessible) training of d... 175
149
A CLI tool designed to find and handle duplicate files 127
Light, Efficient, Omni-modal & Reward-model Driven Reinforcement Fine-Tuning Fra... 120
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More! 119
a Video Quality Analysis Toolkit 118