Behavioral reliability under pressure. Test how LLMs behave when things get hard.
Behavioral auditing toolkit for LLMs — audit any model across 8 dimensions (factual, toxicity, bias, sycophancy, reasoning, refusal, deception, over-refusal) using teacher-forced confidence probes.
Compress LLMs while auditing whether they still know truth vs myths. SVD compression + false-belief detection in one toolkit.
A relationship-aware memory layer for LLM chatbots — models the relationship, not just facts