PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Llm Testing Python Packages

Python packages with the GitHub topic llm-testing. Sorted by relevance, with stars and monthly downloads.
Basaltlabs-app
gauntlet-cli

Behavioral reliability under pressure. Test how LLMs behave when things get hard.

10K 6 0
Pacific-AI-Corp
langtest

Pacific AI provides a library for delivering safe & effective NLP models.

3K 556 49
raga-ai-hub
agentneo

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view

2K 16K 4K
qualixar
agentassay

Token-efficient stochastic testing for AI agents. 5-20x cost reduction. 10 framework adapters. Paper: arXiv:2603.02601

2K 4 1
JohnSnowLabs
nlptest

Deliver safe & effective language models

2K 556 49
nullpointerdepressivedisorder
infer-check

Correctness and reliability testing for LLM inference engines

2K 2 0
LLAMATOR-Core
llamator

Framework for testing vulnerabilities of GenAI systems.

1K 207 19
NahuelGiudizi
ai-safety-tester

LLM security testing framework with CVE-style severity scoring and multi-model benchmarking

949 0 0
AquibNawab
agentcloudkelp

YAML-first stress testing for AI agents. Write a contract, inject faults, catch behavioral drift, enforce cost budgets. No Python test code needed — just kelp.yaml and a terminal.

849 1 0
Swanand33
llm-behave

Behavioral testing for LLM applications. pytest plugin with semantic assertions, multi-turn conversation testing, and drift detection. No LLM judge needed.

676 1 0
ssilwal29
api-test-ninja

API Testing Framework to automate and simplify API testing using LLM Agents and tests defined in plain English.

633 2 1
Addepto
ccheck

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

527 95 11
chanikkyasaai
trajex

AI agent behavioral testing — learns what correct looks like, catches deviations automatically. Zero API keys needed.

389 0 0
evalops
mocktopus

🐙 Multi-armed mocks for LLM apps - Drop-in replacement for OpenAI/Anthropic APIs for deterministic testing

342 6 0
adwantg
toolcallcheck

Deterministic Python testing for tool-using agents. Mock MCP tools, assert exact tool calls and trajectories, verify headers, and run offline in CI.

338 0 0
vincentkoc
tinyqabenchmarkpp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

308 15 0
Rowusuduah
llm-sentry

Unified AI Reliability Platform. One install, 12 diagnostic engines. Zero-dependency LLM pipeline monitoring.

275 0 0
RahulMK22
pyllmtest

🚀 Comprehensive testing framework for LLM applications with semantic assertions, multi-provider support, RAG testing, and prompt optimization. Test AI the right way!

209 1 0
tm243
agent-assembly-line

The simple way to build and embed AI agents into any software stack. Code-native, modular, and LLM-agnostic.

194 0 2
syncreus
syncreus-eval

Evaluate your LLM apps with one function call. Hallucination detection, RAG scoring, and agent evals for OpenAI, Anthropic, and more. 14 evaluators, pytest plugin, composite trust scores.

166 2 0
sazed5055
llmtest-framework

pytest for LLM apps - Test for grounding failures, prompt injection, safety violations, and regressions

163 3 0
LGTMLabs
misalign

A Python library testing LLMs with prompts

125 0 0
dariero
ragaliq

LLM & RAG evaluation testing framework✨

100 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery