Llm Evaluation Framework Python Packages

deepeval

The LLM Evaluation Framework

3.5M 15K 1K

qa-metrics

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model prompting and evaluation, exact match, F1 Score, PEDANT semantic match, transformer match. Our package also supports prompting OPENAI and Anthropic API.

11K 61 6

parea-ai

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

10K 82 11

rhesis-sdk

The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.

2K 317 24

langfair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

2K 257 43

agentic-security

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

2K 2K 248

rhesis

The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.

1K 317 24

fieldtest

LLM evaluation framework — define what correct, well-formed, and safe means before you measure

1K 0 0

evalsense

Tools for systematic large language model evaluations

724 4 1

ccheck

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

508 95 11

multinear

Multinear platform

379 44 1

vero-eval

Open source framework for evaluating AI Agents

261 29 2

llmevals

Eval

166 15K 1K

mseep-agentic-security

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

163 2K 248

deepevals

The LLM Evaluation Framework

162 15K 1K

langalf

Agentic LLM vulnerability scanner

113 2K 245

testllm

Deep eval provides evaluation platform to accelerate development of LLMs and Agents

81 15K 1K

Search Packages