ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.18895
  4. Cited By
Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks

Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks

28 November 2024
Adam Karvonen
Can Rager
Samuel Marks
Neel Nanda
ArXiv (abs)PDFHTML

Papers citing "Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks"

5 / 5 papers shown
Title
Interpretability Illusions with Sparse Autoencoders: Evaluating Robustness of Concept Representations
Interpretability Illusions with Sparse Autoencoders: Evaluating Robustness of Concept Representations
Aaron Jiaxun Li
Suraj Srinivas
Usha Bhalla
Himabindu Lakkaraju
AAML
157
0
0
21 May 2025
Ensembling Sparse Autoencoders
Ensembling Sparse Autoencoders
Soham Gadgil
Chris Lin
Su-In Lee
87
0
0
21 May 2025
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
Learning Multi-Level Features with Matryoshka Sparse Autoencoders
Bart Bussmann
Noa Nabeshima
Adam Karvonen
Neel Nanda
129
13
0
21 Mar 2025
Discovering Chunks in Neural Embeddings for Interpretability
Discovering Chunks in Neural Embeddings for Interpretability
Shuchen Wu
Stephan Alaniz
Eric Schulz
Zeynep Akata
109
0
0
03 Feb 2025
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Gouki Minegishi
Hiroki Furuta
Yusuke Iwasawa
Y. Matsuo
126
3
0
09 Jan 2025
1