22 dependents
Package Description Downloads/month
Open-source library for scalable, reproducible evaluation of AI models and bench... 37K
A framework for evaluating language models - packaged by NVIDIA 14K
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls) 2K
LiveCodeBench - packaged by NVIDIA 2K
Open AI simple evals - packaged by NVIDIA 1K
IFBench: A challenging benchmark for precise instruction following 1K
The tau2 package - packaged by NVIDIA 698
MTBench evaluator - packaged by NVIDIA 320
BigCode Evaluation Harness - packaged by NVIDIA 308
OpenCompass VLM Evaluation Kit - packaged by NVIDIA 286
Content safety evaluation tool - packaged by NVIDIA 270
the LLM vulnerability scanner 242
MMATH - packaged by NVIDIA 212
Humanity's last exam adaptation - packaged by NVIDIA 212
A benchmark that challenges language models to code solutions for scientific pro... 200
Evaluating tool-augmented LLMs in a conversational setting - packaged by NVIDIA 195
The Triton Inference Server provides an optimized cloud and edge inferencing sol... 189
Library for evaluating Large Language Models on CUDA code 173
Holistic Evaluation of Language Models (HELM) is an open source Python framework... 137
Professional domain benchmark for evaluating LLMs on Physics PhD, Chemistry PhD,... 127
Long context evaluations - packaged by NVIDIA NeMo Evaluator 93
Artificial Analysis Long Context Reasoning (AA-LCR) adaptation - packaged by NVI... 72