Ai Testing Python Packages

langwatch-scenario

Agentic testing for agentic codebases

65K 869 60

giskard

🐢 Open-Source Evaluation & Testing library for LLM Agents

40K 5K 446

prompture

Prompture is an API-first library for requesting structured JSON output from LLMs (or any structure), validating it against a schema, and running comparative tests between models.

11K 9 0

tailtester

tailtest — the test + security validator that lives inside Claude Code. Never blocks, never lies.

4K 7 1

langtest

Pacific AI provides a library for delivering safe & effective NLP models.

3K 556 49

aat-devqa

AI-powered automated E2E testing. Just enter a URL — AI generates and runs test scenarios.

3K 5 1

nlptest

Deliver safe & effective language models

2K 556 49

tenro

Open-source simulation harness for testing AI agents. Simulate LLM and tool calls to test edge cases, failure paths, and agent logic without live API calls.

1K 5 0

py-toolguard

The "Cloudflare for AI Agents". 7-layer security interceptor, real-time observability dashboard, and automated reliability testing for MCP and AI tool chains. Prevent hallucinations, prompt injection, and destructive tool calls.

1K 12 3

intentguard

A Python library for verifying code properties using natural language assertions.

1K 35 0

chatbot-tracer

An automated approach for exploring and testing conversational agents using large language models. TRACER discovers chatbot functionalities, generates user profiles, and creates comprehensive test suites for conversational AI systems.

1K 2 0

sharingan-autotest

Autonomous testing agent for Claude Code. Discovers, tests, diagnoses, and fixes your web app.

919 1 0

maia-test-framework

A pytest-based framework for testing multi AI agents system. It provides a flexible and extensible platform for creating and running complex multi-agent simulations and capturing the results.

681 1 0

agentrial

Statistical evaluation framework for AI agents - pytest for agent trajectories

664 16 2

alignmenter

Persona-aligned evaluation toolkit for auditing conversational AI authenticity, safety, and stability.

589 5 0

llm-behave

Behavioral testing for LLM applications. pytest plugin with semantic assertions, multi-turn conversation testing, and drift detection. No LLM judge needed.

586 1 0

ccheck

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

508 95 11

aetherlab

Official Python SDK for AetherLab's AI Guardrails and Compliance Platform

380 1 0

llm-sentry

Unified AI Reliability Platform. One install, 12 diagnostic engines. Zero-dependency LLM pipeline monitoring.

215 0 0

pyllmtest

🚀 Comprehensive testing framework for LLM applications with semantic assertions, multi-provider support, RAG testing, and prompt optimization. Test AI the right way!

147 1 0

llmtest-framework

pytest for LLM apps - Test for grounding failures, prompt injection, safety violations, and regressions

138 3 0

forge-nc

Local-first AI coding assistant with adversarial model testing, transparent context management, and cryptographic audit trails

107 2 0

housemonkey

Chaos testing for AI apps. 18 extreme personas attack your AI to find edge cases before users do. OWASP LLM Top 10 coverage.

100 1 0

llmtestr

A new package that helps developers integration-test AI and LLM applications by validating structured outputs. It takes a user's test scenario or prompt as input, sends it to an LLM, and uses pattern

93 1 0

Search Packages