22 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Open-source library for scalable, reproducible evaluation of AI models and bench... | 37K | |
| A framework for evaluating language models - packaged by NVIDIA | 14K | |
| Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls) | 2K | |
| LiveCodeBench - packaged by NVIDIA | 2K | |
| Open AI simple evals - packaged by NVIDIA | 1K | |
| IFBench: A challenging benchmark for precise instruction following | 1K | |
| The tau2 package - packaged by NVIDIA | 698 | |
| MTBench evaluator - packaged by NVIDIA | 320 | |
| BigCode Evaluation Harness - packaged by NVIDIA | 308 | |
| OpenCompass VLM Evaluation Kit - packaged by NVIDIA | 286 | |
| Content safety evaluation tool - packaged by NVIDIA | 270 | |
| the LLM vulnerability scanner | 242 | |
| MMATH - packaged by NVIDIA | 212 | |
| Humanity's last exam adaptation - packaged by NVIDIA | 212 | |
| A benchmark that challenges language models to code solutions for scientific pro... | 200 | |
| Evaluating tool-augmented LLMs in a conversational setting - packaged by NVIDIA | 195 | |
| The Triton Inference Server provides an optimized cloud and edge inferencing sol... | 189 | |
| Library for evaluating Large Language Models on CUDA code | 173 | |
| Holistic Evaluation of Language Models (HELM) is an open source Python framework... | 137 | |
| Professional domain benchmark for evaluating LLMs on Physics PhD, Chemistry PhD,... | 127 | |
| Long context evaluations - packaged by NVIDIA NeMo Evaluator | 93 | |
| Artificial Analysis Long Context Reasoning (AA-LCR) adaptation - packaged by NVI... | 72 |