Automatically discover where and why your LLM is failing — embedding-space clustering + statistical hypothesis testing to surface input slices with elevated failure rates and audit test suite coverage gaps.