SAEs Are Good for Steering -- If You Select the Right Features

26 May 2025

Papers citing "SAEs Are Good for Steering -- If You Select the Right Features"

2 / 2 papers shown

Title
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering Yongbin Li Zhiting Fan Ruizhe Chen Xiaotang Gai Luqi Gong Yan Zhang Zuozhu Liu LLMSV 99 6 0 20 Apr 2025
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Samuel Marks Can Rager Eric J. Michaud Yonatan Belinkov David Bau Aaron Mueller 175 159 0 28 Mar 2024