48 dependents
Package Description Downloads/month
Transcript Analysis for AI Agents 560K
Software Engineering Agents for Inspect AI 105K
Collection of evals for Inspect AI 96K
Docent SDK 47K
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backend... 22K
An Inspect extension for cyber evaluations 15K
A Kubernetes Sandbox Environment for Inspect 13K
ControlArena is a collection of settings, model organisms and protocols - for ru... 10K
Agent evaluation toolkit 6K
Provider-agnostic, open-source evaluation infrastructure for language models 5K
SDK for the Vals AI Platform 4K
An alignment auditing agent capable of quickly exploring alignment hypothesis 3K
Inspect AI interface to Harbor tasks 3K
Human Time-to-Completion Evaluation CLI 3K
Collection of sandboxes for Inspect AI 3K
Inspect Flow is a workflow stack built on Inspect AI that enables research organ... 2K
A Python SDK for Asteroid 1K
Framework for generating behavioral evaluations of frontier AI models. 1K
1K
Integration between Inspect and Weights & Biases 1K
Run OpenReward environments through the Inspect eval platform 936
Factorio Learning Environment 917
MLflow integration for Inspect AI: experiment tracking, execution tracing, and S... 776
Tools for systematic large language model evaluations 724
A Control Arena setting for evaluating agents inserting backdoors while refactor... 721
Pentester CLI 661
Podman sandbox environment for Inspect AI 610
Bench-AF: Alignment Faking Benchmark 598
Entropy Labs' Sentinel is an agent control plane that enables efficient oversigh... 514
MLflow integration for Inspect AI 497
445
Opt-in lint for Inspect AI tasks: warn when @verifiable_task uses a model-graded... 402
Chain-of-thought monitorability and faithfulness evaluation for reasoning-model ... 387
Generalized Cognitive Refinement Iteration - Hierarchical Multi-Agent System wit... 386
Pluristic alignment evaluation benchmark for LLMs 275
A utterly useless package that imports everything for you. Now with top 1000 PyP... 247
Policy-enforced sandbox environment extension for Inspect AI 228
Multi-agent system for AI evaluations in AISI's inspect-ai framework 215
Utility to connect and use Fulcrum Research to understand your agents 187
A lightweight library for running AI Control experiments 161
Inspect-AI–based agents and tools. 151
Human evaluation for Control Arena settings 149
Tool for configuring and running control_arena experiments using .yaml files. 131
Gage support for Inspect AI 130
Python package to interface with the AI infrastructure over at ohmytofu.ai 115
90
High-level, zero-code interface for evaluating LLMs using Inspect-AI 71
Use Goodfire Ember API with UK-AISI Inspect AI 65