Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.15038
Cited By
Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering
21 May 2025
Haiyan Zhao
Xuansheng Wu
Fan Yang
Bo Shen
Ninghao Liu
Mengnan Du
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Denoising Concept Vectors with Sparse Autoencoders for Improved Language Model Steering"
8 / 8 papers shown
Title
SAIF: A Sparse Autoencoder Framework for Interpreting and Steering Instruction Following of Language Models
Z. He
Haiyan Zhao
Yiran Qiao
Fan Yang
Ali Payani
Jing Ma
Jundong Li
LLMSV
100
9
0
17 Feb 2025
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
107
26
0
15 Oct 2024
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao
Heng Zhao
Bo Shen
Ali Payani
Fan Yang
Mengnan Du
89
5
0
30 Sep 2024
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
David Chanin
James Wilken-Smith
Tomáš Dulka
Hardik Bhatnagar
Joseph Bloom
Joseph Isaac Bloom
86
35
0
22 Sep 2024
Uncovering Latent Chain of Thought Vectors in Language Models
Jason Zhang
Scott Viteri
LLMSV
LRM
95
3
0
21 Sep 2024
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Y. Zou
79
110
0
11 Nov 2023
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
Been Kim
Martin Wattenberg
Justin Gilmer
Carrie J. Cai
James Wexler
F. Viégas
Rory Sayres
FAtt
211
1,842
0
30 Nov 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
687
131,526
0
12 Jun 2017
1