Activation steering and trait monitoring for HuggingFace transformers
Steering vectors for transformer language models in Pytorch / Huggingface
Linear probes and activation steering for transformer models