Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16505
Cited By
Sparse Activation Editing for Reliable Instruction Following in Narratives
22 May 2025
Runcong Zhao
Chengyu Cao
Qinglin Zhu
Xiucheng Lv
Shun Shao
Lin Gui
Ruifeng Xu
Yulan He
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Sparse Activation Editing for Reliable Instruction Following in Narratives"
4 / 4 papers shown
Title
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Jundong Li
LLMSV
121
9
0
17 Feb 2025
RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing and Instruction-Following
Junru Lu
Jiazheng Li
Guodong Shen
Lin Gui
Siyu An
Yulan He
Di Yin
Xing Sun
53
1
0
17 Feb 2025
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
Yu Zhao
Alessio Devoto
Giwon Hong
Xiaotang Du
Aryo Pradipta Gema
Hongru Wang
Xuanli He
Kam-Fai Wong
Pasquale Minervini
KELM
LLMSV
134
28
0
21 Oct 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
173
159
0
28 Mar 2024
1