PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
confident-ai
deepeval

The LLM Evaluation Framework

3.5M 15K 1K
zli12321
qa-metrics

An easy python package to run quick basic QA evaluations. This package includes standardized QA evaluation metrics and semantic evaluation metrics: Black-box and Open-Source large language model prompting and evaluation, exact match, F1 Score, PEDANT semantic match, transformer match. Our package also supports prompting OPENAI and Anthropic API.

11K 61 6
parea-ai
parea-ai

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

10K 82 11
rhesis-ai
rhesis-sdk

The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.

2K 317 24
cvs-health
langfair

LangFair is a Python library for conducting use-case level LLM bias and fairness assessments

2K 257 43
msoedov
agentic-security

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

2K 2K 248
rhesis-ai
rhesis

The testing platform for AI teams. Bring engineers, PMs, and domain experts together to generate tests, simulate (adversarial) conversations, and trace every failure to its root cause.

1K 317 24
gmitt98
fieldtest

LLM evaluation framework — define what correct, well-formed, and safe means before you measure

1K 0 0
nhsengland
evalsense

Tools for systematic large language model evaluations

724 4 1
Addepto
ccheck

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

508 95 11
multinear
multinear

Multinear platform

379 44 1
vero-labs-ai
vero-eval

Open source framework for evaluating AI Agents

261 29 2
mr-gpt
llmevals

Eval

166 15K 1K
msoedov
mseep-agentic-security

Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪

163 2K 248
mr-gpt
deepevals

The LLM Evaluation Framework

162 15K 1K
msoedov
langalf

Agentic LLM vulnerability scanner

113 2K 245
mr-gpt
testllm

Deep eval provides evaluation platform to accelerate development of LLMs and Agents

81 15K 1K
    • Data from PyPI, GitHub, ClickHouse, and BigQuery