PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Search Packages

Find Python packages by name, description, GitHub topic, or filter by metrics
Basaltlabs-app
gauntlet-cli

Behavioral reliability under pressure. Test how LLMs behave when things get hard.

10K 6 0
Pacific-AI-Corp
langtest

Pacific AI provides a library for delivering safe & effective NLP models.

3K 556 49
raga-ai-hub
agentneo

Python SDK for Agent AI Observability, Monitoring and Evaluation Framework. Includes features like agent, llm and tools tracing, debugging multi-agentic system, self-hosted dashboard and advanced analytics with timeline and execution graph view

2K 16K 4K
qualixar
agentassay

Token-efficient stochastic testing for AI agents. 5-20x cost reduction. 10 framework adapters. Paper: arXiv:2603.02601

2K 4 1
nullpointerdepressivedisorder
infer-check

Correctness and reliability testing for LLM inference engines

2K 2 0
JohnSnowLabs
nlptest

Deliver safe & effective language models

2K 556 49
LLAMATOR-Core
llamator

Framework for testing vulnerabilities of GenAI systems.

1K 207 19
NahuelGiudizi
ai-safety-tester

LLM security testing framework with CVE-style severity scoring and multi-model benchmarking

847 0 0
AquibNawab
agentcloudkelp

YAML-first stress testing for AI agents. Write a contract, inject faults, catch behavioral drift, enforce cost budgets. No Python test code needed — just kelp.yaml and a terminal.

740 1 0
Swanand33
llm-behave

Behavioral testing for LLM applications. pytest plugin with semantic assertions, multi-turn conversation testing, and drift detection. No LLM judge needed.

586 1 0
ssilwal29
api-test-ninja

API Testing Framework to automate and simplify API testing using LLM Agents and tests defined in plain English.

570 2 1
Addepto
ccheck

MIT-licensed Framework for LLMs, RAGs, Chatbots testing. Configurable via YAML and integrable into CI pipelines for automated testing.

508 95 11
adwantg
toolcallcheck

Deterministic Python testing for tool-using agents. Mock MCP tools, assert exact tool calls and trajectories, verify headers, and run offline in CI.

400 0 0
chanikkyasaai
trajex

AI agent behavioral testing — learns what correct looks like, catches deviations automatically. Zero API keys needed.

366 0 0
evalops
mocktopus

🐙 Multi-armed mocks for LLM apps - Drop-in replacement for OpenAI/Anthropic APIs for deterministic testing

340 6 0
vincentkoc
tinyqabenchmarkpp

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

260 15 0
Rowusuduah
llm-sentry

Unified AI Reliability Platform. One install, 12 diagnostic engines. Zero-dependency LLM pipeline monitoring.

215 0 0
syncreus
syncreus-eval

Evaluate your LLM apps with one function call. Hallucination detection, RAG scoring, and agent evals for OpenAI, Anthropic, and more. 14 evaluators, pytest plugin, composite trust scores.

164 2 0
RahulMK22
pyllmtest

🚀 Comprehensive testing framework for LLM applications with semantic assertions, multi-provider support, RAG testing, and prompt optimization. Test AI the right way!

147 1 0
sazed5055
llmtest-framework

pytest for LLM apps - Test for grounding failures, prompt injection, safety violations, and regressions

138 3 0
LGTMLabs
misalign

A Python library testing LLMs with prompts

120 0 0
tm243
agent-assembly-line

The simple way to build and embed AI agents into any software stack. Code-native, modular, and LLM-agnostic.

116 0 2
dariero
ragaliq

LLM & RAG evaluation testing framework — hallucination detection, faithfulness metrics, answer relevance scoring, and retrieval pipeline testing with pytest integration

86 1 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery