Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.15999
Cited By
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
21 October 2024
Yu Zhao
Alessio Devoto
Giwon Hong
Xiaotang Du
Aryo Pradipta Gema
Hongru Wang
Xuanli He
Kam-Fai Wong
Pasquale Minervini
KELM
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering"
14 / 14 papers shown
Title
Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent
Ziyang Huang
Xiaowei Yuan
Yiming Ju
Jun Zhao
Kang Liu
RALM
KELM
26
0
0
12 May 2025
Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
Dong Shu
Xuansheng Wu
Haiyan Zhao
Jundong Li
Ninghao Liu
LLMSV
42
0
0
12 May 2025
Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering
Jessica Y. Bo
Tianyu Xu
Ishan Chatterjee
Katrina Passarella-Ward
Achin Kulshrestha
D Shin
LLMSV
82
0
0
07 May 2025
EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models
Ziwen Xu
Shuxun Wang
Kewei Xu
Haoming Xu
Mengru Wang
Xinle Deng
Yunzhi Yao
Guozhou Zheng
H. Chen
Ningyu Zhang
KELM
LLMSV
158
0
0
21 Apr 2025
ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning
Zijian Wang
Chang Xu
LRM
30
1
0
09 Apr 2025
Steering off Course: Reliability Challenges in Steering Language Models
Patrick Queiroz Da Silva
Hari Sethuraman
Dheeraj Rajagopal
Hannaneh Hajishirzi
Sachin Kumar
LLMSV
29
1
0
06 Apr 2025
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song
Xuwei Ding
Jieyu Zhang
Taiwei Shi
Ryotaro Shimizu
Rahul Gupta
Yong-Jin Liu
Jian Kang
Jieyu Zhao
KELM
61
0
0
30 Mar 2025
SAKE: Steering Activations for Knowledge Editing
Marco Scialanga
Thibault Laugel
Vincent Grari
Marcin Detyniecki
KELM
LLMSV
74
1
0
03 Mar 2025
Steered Generation via Gradient Descent on Sparse Features
Sumanta Bhattacharyya
Pedram Rooshenas
LLMSV
43
0
0
25 Feb 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
102
0
0
24 Feb 2025
Sparse Autoencoder Features for Classifications and Transferability
Jack Gallifant
Shan Chen
Kuleen Sasse
Hugo J. W. L. Aerts
Thomas Hartvigsen
Danielle S. Bitterman
46
3
0
17 Feb 2025
Designing Role Vectors to Improve LLM Inference Behaviour
Daniele Potertì
Andrea Seveso
Fabio Mercorio
LLMSV
54
0
0
17 Feb 2025
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
67
10
0
18 Nov 2024
Improving Steering Vectors by Targeting Sparse Autoencoder Features
Sviatoslav Chalnev
Matthew Siu
Arthur Conmy
LLMSV
52
16
0
04 Nov 2024
1