Prompt Evaluation Python Packages

promptstats

Statistical analysis methods for comparing prompt and model performance in LLM evaluations.

1K 101 2

evalstats

Statistical analysis methods for comparing prompt and model performance in LLM evaluations.

1K 101 2

judicator

Who evaluates the evaluator? Judicator audits LLM-as-a-Judge systems for 7 documented bias types. Zero config. Works with any LLM.

973 5 1

prompt-foundry-python-sdk

The prompt engineering, prompt management, and prompt evaluation tool for Python

438 8 0

Search Packages