Mechanistic Interpretability Python Packages

pyvene

Stanford NLP Python library for understanding and improving PyTorch models via interventions

6K 872 103

llmoji

Making agents cuter

4K 2 0

nnterp

Unified access to Large Language Model modules using NNsight

3K 109 11

glassbox-mech-interp

Mechanistic interpretability + EU AI Act Annex IV compliance. 21/21 frameworks: ACDC edge-circuit discovery, multi-arch GQA/RMSNorm adapter (Llama-3/Mistral/Phi-3), cross-model comparison, causal scrubbing, DAS, Hessian bounds, BH FDR, folded LayerNorm, SAE polysemanticity, multi-corruption, held-out validation. Dual-licensed (MIT core + BSL 1.1 compliance engine).

3K 1 0

openinterp

openinterp — Python SDK + CLI. FabricationGuard hallucination probe + ProbeBench leaderboard + Atlas search + Trace generation. pip install openinterp

2K 0 0

carl-studio

Coherence-Aware Reinforcement Learning (CARL) - breakthrough LLM post-training and test-time training paradigm. carl builds the world's most advanced and intelligent agent systems that are a step change over current gen agents

2K 2 1

steering-vectors

Steering vectors for transformer language models in Pytorch / Huggingface

1K 149 18

mlx-lens

Mechanistic interpretability on Apple Silicon: steering vectors, residual capture, and SAE analysis for MLX models

949 1 0

reptimeline

Track how discrete representations evolve during neural network training — lifecycle events, phase transitions, ontology discovery, and causal verification

853 0 0

mechreward

Mechanistic interpretability as reward signal for RL training of LLMs

691 5 0

carl-core

477 2 1

neuro-scan

LLM Neuroanatomy Explorer — map what each transformer layer does

308 0 0

graphpatch

graphpatch is a library for activation patching on PyTorch neural network models.

247 21 0

mlxlmprobe

Visual probing and interpretability tool for MLX language models

246 3 0

codebook-features

Sparse and discrete interpretability tool for neural networks

204 64 5

glassbox-mcp

Open-source EU AI Act Annex IV compliance toolkit. Mechanistic interpretability + circuit discovery for transformers. One function call generates a court-ready evidence package

143 1 0

arrakis-mi

A mechanistic interpretability library for nerds.

102 31 4

reprobe

Phase-aware LLM activation steering and linear probing. A memory-efficient, practical implementation of Representation Engineering (RepE) for safety research.

91 2 0

todacomm

Topological Data Analysis Comparison of Multiple Models - A framework for characterizing transformer representations using persistent homology

62 0 0