48 dependents
| Package | Description | Downloads/month |
|---|---|---|
| Transcript Analysis for AI Agents | 560K | |
| Software Engineering Agents for Inspect AI | 105K | |
| Collection of evals for Inspect AI | 96K | |
| Docent SDK | 47K | |
| Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backend... | 22K | |
| An Inspect extension for cyber evaluations | 15K | |
| A Kubernetes Sandbox Environment for Inspect | 13K | |
| ControlArena is a collection of settings, model organisms and protocols - for ru... | 10K | |
| Agent evaluation toolkit | 6K | |
| Provider-agnostic, open-source evaluation infrastructure for language models | 5K | |
| SDK for the Vals AI Platform | 4K | |
| An alignment auditing agent capable of quickly exploring alignment hypothesis | 3K | |
| Inspect AI interface to Harbor tasks | 3K | |
| Human Time-to-Completion Evaluation CLI | 3K | |
| Collection of sandboxes for Inspect AI | 3K | |
| Inspect Flow is a workflow stack built on Inspect AI that enables research organ... | 2K | |
| A Python SDK for Asteroid | 1K | |
| Framework for generating behavioral evaluations of frontier AI models. | 1K | |
| 1K | ||
| Integration between Inspect and Weights & Biases | 1K | |
| Run OpenReward environments through the Inspect eval platform | 936 | |
| Factorio Learning Environment | 917 | |
| MLflow integration for Inspect AI: experiment tracking, execution tracing, and S... | 776 | |
| Tools for systematic large language model evaluations | 724 | |
| A Control Arena setting for evaluating agents inserting backdoors while refactor... | 721 | |
| Pentester CLI | 661 | |
| Podman sandbox environment for Inspect AI | 610 | |
| Bench-AF: Alignment Faking Benchmark | 598 | |
| Entropy Labs' Sentinel is an agent control plane that enables efficient oversigh... | 514 | |
| MLflow integration for Inspect AI | 497 | |
| 445 | ||
| Opt-in lint for Inspect AI tasks: warn when @verifiable_task uses a model-graded... | 402 | |
| Chain-of-thought monitorability and faithfulness evaluation for reasoning-model ... | 387 | |
| Generalized Cognitive Refinement Iteration - Hierarchical Multi-Agent System wit... | 386 | |
| Pluristic alignment evaluation benchmark for LLMs | 275 | |
| A utterly useless package that imports everything for you. Now with top 1000 PyP... | 247 | |
| Policy-enforced sandbox environment extension for Inspect AI | 228 | |
| Multi-agent system for AI evaluations in AISI's inspect-ai framework | 215 | |
| Utility to connect and use Fulcrum Research to understand your agents | 187 | |
| A lightweight library for running AI Control experiments | 161 | |
| Inspect-AI–based agents and tools. | 151 | |
| Human evaluation for Control Arena settings | 149 | |
| Tool for configuring and running control_arena experiments using .yaml files. | 131 | |
| Gage support for Inspect AI | 130 | |
| Python package to interface with the AI infrastructure over at ohmytofu.ai | 115 | |
| 90 | ||
| High-level, zero-code interface for evaluating LLMs using Inspect-AI | 71 | |
| Use Goodfire Ember API with UK-AISI Inspect AI | 65 |