Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.11356
Cited By
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
17 February 2025
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Mengnan Du
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models"
1 / 1 papers shown
Title
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
Dong Shu
Xuansheng Wu
Haiyan Zhao
Mengnan Du
Ninghao Liu
LLMSV
40
0
0
12 May 2025
1