openinterp — Python SDK + CLI. FabricationGuard hallucination probe + ProbeBench leaderboard + Atlas search + Trace generation. pip install openinterp
Mechanistic interpretability on Apple Silicon: steering vectors, residual capture, and SAE analysis for MLX models
Mechanistic interpretability as reward signal for RL training of LLMs
A tool for visualizing and exploring feature activations in neural language models.
SANSA - sparse EASE for millions of items