Agent Evaluation Python Packages

trulens-core

Evaluation and Tracking for LLM Experiments and AI Agents

86K 3K 271

trulens-otel-semconv

Evaluation and Tracking for LLM Experiments and AI Agents

60K 3K 271

trulens-feedback

Evaluation and Tracking for LLM Experiments and AI Agents

59K 3K 271

trulens-dashboard

Evaluation and Tracking for LLM Experiments and AI Agents

57K 3K 271

trulens-eval

Evaluation and Tracking for LLM Experiments and AI Agents

48K 3K 271

trulens

Evaluation and Tracking for LLM Experiments and AI Agents

44K 3K 271

giskard

🐢 Open-Source Evaluation & Testing library for LLM Agents

40K 5K 446

trulens-connectors-snowflake

Evaluation and Tracking for LLM Experiments and AI Agents

38K 3K 271

trulens-providers-cortex

Evaluation and Tracking for LLM Experiments and AI Agents

22K 3K 271

trulens-providers-openai

Evaluation and Tracking for LLM Experiments and AI Agents

13K 3K 271

trulens-providers-litellm

Evaluation and Tracking for LLM Experiments and AI Agents

12K 3K 271

etzchaim

A diagnosable brain for your LLM. Cognitive architecture in the SOAR/ACT-R/CLARION/LIDA lineage, for the LLM era. Apache 2.0.

10K 1 0

trulens-apps-langchain

Evaluation and Tracking for LLM Experiments and AI Agents

10K 3K 271

any-agent

A single interface to use and evaluate different agent frameworks

6K 1K 93

trulens-apps-llamaindex

Evaluation and Tracking for LLM Experiments and AI Agents

6K 3K 271

trulens-providers-bedrock

Evaluation and Tracking for LLM Experiments and AI Agents

5K 3K 271

trulens-apps-langgraph

Evaluation and Tracking for LLM Experiments and AI Agents

5K 3K 271

trulens-providers-langchain

Evaluation and Tracking for LLM Experiments and AI Agents

5K 3K 271

trulens-providers-huggingface

Evaluation and Tracking for LLM Experiments and AI Agents

3K 3K 271

evalview

Open-source testing and regression detection framework for AI agents. Golden baseline diffing, CI/CD integration, works with LangGraph, CrewAI, OpenAI, Anthropic Claude, HuggingFace, Ollama, and MCP.

3K 95 21

trulens-apps-nemo

Evaluation and Tracking for LLM Experiments and AI Agents

3K 3K 271

volnix

A living world where agents exist as participants alongside NPCs, internal actors, real service APIs, budgets, policies, and consequences.

3K 7 2

trulens-benchmark

Evaluation and Tracking for LLM Experiments and AI Agents

3K 3K 271

agentassay

Token-efficient stochastic testing for AI agents. 5-20x cost reduction. 10 framework adapters. Paper: arXiv:2603.02601

2K 4 1

Search Packages