Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.18895
Cited By
Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks
28 November 2024
Adam Karvonen
Can Rager
Samuel Marks
Neel Nanda
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks"
5 / 5 papers shown
Title
Interpretability Illusions with Sparse Autoencoders: Evaluating Robustness of Concept Representations
Aaron Jiaxun Li
Suraj Srinivas
Usha Bhalla
Himabindu Lakkaraju
AAML
157
0
0
21 May 2025
Ensembling Sparse Autoencoders
Soham Gadgil
Chris Lin
Su-In Lee
87
0
0
21 May 2025
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
Bart Bussmann
Noa Nabeshima
Adam Karvonen
Neel Nanda
129
13
0
21 Mar 2025
Discovering Chunks in Neural Embeddings for Interpretability
Shuchen Wu
Stephan Alaniz
Eric Schulz
Zeynep Akata
109
0
0
03 Feb 2025
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Gouki Minegishi
Hiroki Furuta
Yusuke Iwasawa
Y. Matsuo
126
3
0
09 Jan 2025
1