Hallucination Detection Python Packages

styxx

Cognitive observability for LLM agents. Nine calibrated cognometric instruments — pure-Python, MIT, no LLM required. 9-for-9 on K=1 phase transition. Every Mind Leaves Vitals (DOI 10.5281/zenodo.19777921).

29K 5 1

gauntlet-cli

Behavioral reliability under pressure. Test how LLMs behave when things get hard.

10K 6 0

uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection

6K 1K 121

insa-its

Runtime Security for Multi-Agent AI — Website & Documentation

6K 23 0

lettucedetect

Lightweight hallucination detection framework for RAG applications

5K 568 39

yuragi

yuragi — LLM Confidence Fragility Analyzer. Perturbation-driven hallucination detection with workshop-grade real benchmarks (TruthfulQA n=412 ensemble AUC 0.73, TriviaQA n=200 confidence-inversion AUC 0.75).

4K 0 0

uptrain

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

3K 2K 203

sovereign-shield

AI security framework: deterministic Immutable input filtering, adaptive rule learning, optional LLM veto verification. Zero dependencies. Works without an LLM. Patent Pending.

2K 19 7

director-ai

Real-time LLM hallucination guardrail — NLI + RAG fact-checking with token-level streaming halt

2K 0 0

dingo-python

Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool

2K 691 71

longtracer

RAG verification guardrails — detect hallucinations in LLM responses using hybrid STS + NLI.

1K 29 4

qwed

The Deterministic Verification Protocol for AI - 11 verification engines for math, logic, code, SQL, facts, images, and more. Now with Agentic Security Guards.

1K 55 8

langchain-agentoracle

Trust layer for AI agents. Verify before you act. Per-claim verification via 4 independent sources. x402-native on Base, SKALE, Stellar.

627 0 0

hdm2

HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification

430 11 0

certainlogic-guard

Linguistic confidence gate for AI responses. Catches hedging words (maybe, I think). Not fact verification. Zero dependencies.

389 0 0

sovereign-shield-adaptive

AI security framework: deterministic Immutable input filtering, adaptive rule learning, optional LLM veto verification. Zero dependencies. Works without an LLM. Patent Pending.

354 19 7

sovereign-mcp

Deterministic MCP Security Architecture. FrozenNamespace as Root of Trust for Model Context Protocol tool verification

309 3 4

backfire-kernel

Director-Class AI — Rust Backfire Kernel (50ms safety gate)

299 0 0

trusteval-ai

Enterprise LLM Evaluation & Responsible AI Framework — Benchmark bias, hallucination, PII leakage, and toxicity across Healthcare, BFSI, Retail & Legal industries. Supports OpenAI, Anthropic, Gemini & HuggingFace. Python SDK + CLI + Web Dashboard. 191 tests. Compliance-ready reports.

254 7 5

dep-hallucinator

Advanced security scanner for detecting AI-generated dependency confusion vulnerabilities with signature verification support

251 7 0

wraith

Catches what your AI forgot to check. Deterministic linter for AI-generated Python code — hallucinated APIs, phantom packages, hardcoded secrets, taint analysis. 20 rules, zero config.

237 4 0

hallucinationbench

Detect hallucinations in your RAG pipeline output — in two lines of Python.

202 2 0

syncreus-eval

Evaluate your LLM apps with one function call. Hallucination detection, RAG scoring, and agent evals for OpenAI, Anthropic, and more. 14 evaluators, pytest plugin, composite trust scores.

166 2 0

medguard-llm

Healthcare-specific LLM guardrails middleware for clinical safety

156 0 0