YAML-first stress testing for AI agents. Write a contract, inject faults, catch behavioral drift, enforce cost budgets. No Python test code needed — just kelp.yaml and a terminal.
Flakestorm — Automated Robustness Testing for AI Agents. Stop guessing if your agent really works. FlakeStorm generates adversarial mutations and exposes failures your manual tests and evals miss.