Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.08080
Cited By
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
12 May 2025
Dong Shu
Xuansheng Wu
Haiyan Zhao
Jundong Li
Ninghao Liu
LLMSV
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders"
9 / 9 papers shown
Title
Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders
Xuansheng Wu
Jiayi Yuan
Wenlin Yao
Xiaoming Zhai
Ninghao Liu
LLMSV
147
10
0
24 Feb 2025
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Jundong Li
LLMSV
110
9
0
17 Feb 2025
Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts
Shruti Joshi
Andrea Dittadi
Sébastien Lachapelle
Dhanya Sridhar
LLMSV
81
2
0
14 Feb 2025
Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders
Kola Ayonrinde
68
5
0
04 Nov 2024
Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
Davide Ghilardi
Federico Belotti
Marco Molinari
64
5
0
28 Oct 2024
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Yu Zhao
Alessio Devoto
Giwon Hong
Xiaotang Du
Aryo Pradipta Gema
Hongru Wang
Xuanli He
Kam-Fai Wong
Pasquale Minervini
KELM
LLMSV
101
25
0
21 Oct 2024
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun
Jordan K. Taylor
Nicholas Goldowsky-Dill
Lee D. Sharkey
70
39
0
17 May 2024
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
286
8,134
0
16 Jun 2016
Linear Algebraic Structure of Word Senses, with Applications to Polysemy
Sanjeev Arora
Yuanzhi Li
Yingyu Liang
Tengyu Ma
Andrej Risteski
83
283
0
14 Jan 2016
1