Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders

Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders

12 May 2025

ArXiv (abs)PDF HTML

Papers citing "Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders"

9 / 9 papers shown

Title
Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders Xuansheng Wu Jiayi Yuan Wenlin Yao Xiaoming Zhai Ninghao Liu LLMSV 147 10 0 24 Feb 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models Z. He Haiyan Zhao Yiran Qiao Fan Yang Ali Payani Jing Ma Jundong Li LLMSV 110 9 0 17 Feb 2025
Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts Shruti Joshi Andrea Dittadi Sébastien Lachapelle Dhanya Sridhar LLMSV 79 2 0 14 Feb 2025
Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders Kola Ayonrinde 66 5 0 04 Nov 2024
Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups Davide Ghilardi Federico Belotti Marco Molinari 64 5 0 28 Oct 2024
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering Yu Zhao Alessio Devoto Giwon Hong Xiaotang Du Aryo Pradipta Gema Hongru Wang Xuanli He Kam-Fai Wong Pasquale Minervini KELM LLMSV 101 25 0 21 Oct 2024
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning Dan Braun Jordan K. Taylor Nicholas Goldowsky-Dill Lee D. Sharkey 70 39 0 17 May 2024
SQuAD: 100,000+ Questions for Machine Comprehension of Text Pranav Rajpurkar Jian Zhang Konstantin Lopyrev Percy Liang RALM 286 8,134 0 16 Jun 2016
Linear Algebraic Structure of Word Senses, with Applications to Polysemy Sanjeev Arora Yuanzhi Li Yingyu Liang Tengyu Ma Andrej Risteski 83 283 0 14 Jan 2016

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.