Evals Python Packages

logfire

AI observability platform for production LLM and agent systems.

24.7M 4K 230

arize-phoenix

AI Observability & Evaluation

2.2M 10K 850

arize-phoenix-otel

AI Observability & Evaluation

1.6M 10K 850

arize-phoenix-client

AI Observability & Evaluation

881K 10K 850

arize-phoenix-evals

AI Observability & Evaluation

762K 10K 850

agentops

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and CamelAI

718K 6K 571

trulens-core

Evaluation and Tracking for LLM Experiments and AI Agents

86K 3K 271

trulens-otel-semconv

Evaluation and Tracking for LLM Experiments and AI Agents

59K 3K 271

trulens-feedback

Evaluation and Tracking for LLM Experiments and AI Agents

58K 3K 271

trulens-dashboard

Evaluation and Tracking for LLM Experiments and AI Agents

56K 3K 271

trulens-eval

Evaluation and Tracking for LLM Experiments and AI Agents

48K 3K 271

trulens

Evaluation and Tracking for LLM Experiments and AI Agents

43K 3K 271

trulens-connectors-snowflake

Evaluation and Tracking for LLM Experiments and AI Agents

38K 3K 271

harbor-rewardkit

Harbor is a framework for running agent evaluations and creating and using RL environments.

25K 2K 978

evalica

Evalica, your favourite evaluation toolkit

23K 62 5

trulens-providers-cortex

Evaluation and Tracking for LLM Experiments and AI Agents

22K 3K 271

shadow-diff

Behavior contracts for AI agents

22K 4 0

trulens-providers-openai

Evaluation and Tracking for LLM Experiments and AI Agents

13K 3K 271

trulens-providers-litellm

Evaluation and Tracking for LLM Experiments and AI Agents

12K 3K 271

cellin

build long-lived multimodal memory, dream over it, and retrieve context with transparent weighting

12K 0 0

trulens-apps-langchain

Evaluation and Tracking for LLM Experiments and AI Agents

10K 3K 271

mcp-assert

Test your MCP server against the real protocol. Any language, any transport. No mocks, no imports, no language lock-in.

7K 4 1

selectools

Production-ready Python framework for AI agents with built-in guardrails, audit logging, cost tracking, and hybrid RAG. Supports OpenAI, Anthropic, Gemini, Ollama. By NichevLabs.

6K 9 1

trulens-apps-llamaindex

Evaluation and Tracking for LLM Experiments and AI Agents

5K 3K 271

Search Packages