PyPI Stats
  • Insights
  • PyPI
  • GitHub
  • Search
  • Compare
  • Advisories
  • Ecosystem
  • About
Home

Mechanistic Interpretability Python Packages

Python packages with the GitHub topic mechanistic-interpretability. Sorted by relevance, with stars and monthly downloads.
stanfordnlp
pyvene

Stanford NLP Python library for understanding and improving PyTorch models via interventions

6K 872 103
a9lim
llmoji

Making agents cuter

4K 2 0
butanium
nnterp

Unified access to Large Language Model modules using NNsight

3K 109 11
designer-coderajay
glassbox-mech-interp

Mechanistic interpretability + EU AI Act Annex IV compliance. 21/21 frameworks: ACDC edge-circuit discovery, multi-arch GQA/RMSNorm adapter (Llama-3/Mistral/Phi-3), cross-model comparison, causal scrubbing, DAS, Hessian bounds, BH FDR, folded LayerNorm, SAE polysemanticity, multi-corruption, held-out validation. Dual-licensed (MIT core + BSL 1.1 compliance engine).

3K 1 0
OpenInterpretability
openinterp

openinterp — Python SDK + CLI. FabricationGuard hallucination probe + ProbeBench leaderboard + Atlas search + Trace generation. pip install openinterp

2K 0 0
wheattoast11
carl-studio

Coherence-Aware Reinforcement Learning (CARL) - breakthrough LLM post-training and test-time training paradigm. carl builds the world's most advanced and intelligent agent systems that are a step change over current gen agents

2K 2 1
steering-vectors
steering-vectors

Steering vectors for transformer language models in Pytorch / Huggingface

1K 149 18
dthinkr
mlx-lens

Mechanistic interpretability on Apple Silicon: steering vectors, residual capture, and SAE analysis for MLX models

949 1 0
arturoornelasb
reptimeline

Track how discrete representations evolve during neural network training — lifecycle events, phase transitions, ontology discovery, and causal verification

853 0 0
caiovicentino
mechreward

Mechanistic interpretability as reward signal for RL training of LLMs

691 5 0
wheattoast11
carl-core

Coherence-Aware Reinforcement Learning (CARL) - breakthrough LLM post-training and test-time training paradigm. carl builds the world's most advanced and intelligent agent systems that are a step change over current gen agents

477 2 1
XXO47OXX
neuro-scan

LLM Neuroanatomy Explorer — map what each transformer layer does

308 0 0
evan-lloyd
graphpatch

graphpatch is a library for activation patching on PyTorch neural network models.

247 21 0
scouzi1966
mlxlmprobe

Visual probing and interpretability tool for MLX language models

246 3 0
taufeeque9
codebook-features

Sparse and discrete interpretability tool for neural networks

204 64 5
designer-coderajay
glassbox-mcp

Open-source EU AI Act Annex IV compliance toolkit. Mechanistic interpretability + circuit discovery for transformers. One function call generates a court-ready evidence package

143 1 0
yash-srivastava19
arrakis-mi

A mechanistic interpretability library for nerds.

102 31 4
levashi
reprobe

Phase-aware LLM activation steering and linear probing. A memory-efficient, practical implementation of Representation Engineering (RepE) for safety research.

91 2 0
aiexplorations
todacomm

Topological Data Analysis Comparison of Multiple Models - A framework for characterizing transformer representations using persistent homology

62 0 0
    • Data from PyPI, GitHub, ClickHouse, and BigQuery