Comprehensive AI Model Evaluation Framework with support for multiple LLM providers
Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view
Statistical analysis methods for comparing prompt and model performance in LLM evaluations.