Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.00657
Cited By
v1
v2 (latest)
Disentangling Dense Embeddings with Sparse Autoencoders
1 August 2024
Charles OÑeill
Christine Ye
K. Iyer
John F. Wu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Disentangling Dense Embeddings with Sparse Autoencoders"
4 / 4 papers shown
Title
Sparse Autoencoders, Again?
Yin Lu
X. Zhu
Tong He
David Wipf
108
0
0
05 Jun 2025
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms
Mengru Wang
Ziwen Xu
Shengyu Mao
Shumin Deng
Zhaopeng Tu
Ningyu Zhang
N. Zhang
LLMSV
140
0
0
23 May 2025
Interpretability Illusions with Sparse Autoencoders: Evaluating Robustness of Concept Representations
Aaron Jiaxun Li
Suraj Srinivas
Usha Bhalla
Himabindu Lakkaraju
AAML
178
0
0
21 May 2025
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Blake Bullwinkel
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
174
18
0
18 Nov 2024
1