Llm Metrics Python Packages

tool-scorer

Lightweight tool-call testing for LLM agents. Deterministic, local, zero API cost. Compare expected vs actual tool calls in 3 lines of Python. Supports OpenAI, Anthropic, Gemini.

480 5 0