Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.16681
Cited By
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
23 February 2025
Subhash Kantamneni
Joshua Engels
Senthooran Rajamanoharan
Max Tegmark
Neel Nanda
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Are Sparse Autoencoders Useful? A Case Study in Sparse Probing"
24 / 24 papers shown
Title
Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
James Oldfield
Shawn Im
Yixuan Li
M. Nicolaou
Ioannis Patras
Grigorios G. Chrysos
MoE
43
0
0
27 May 2025
Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms
Mengru Wang
Ziwen Xu
Shengyu Mao
Shumin Deng
Zhaopeng Tu
Ningyu Zhang
N. Zhang
LLMSV
73
0
0
23 May 2025
TRACE for Tracking the Emergence of Semantic Representations in Transformers
Nura Aljaafari
Danilo S. Carvalho
André Freitas
69
0
0
23 May 2025
SAE-SSV: Supervised Steering in Sparse Representation Spaces for Reliable Control of Language Models
Zirui He
Mingyu Jin
Bo Shen
Ali Payani
Yongfeng Zhang
Mengnan Du
LLMSV
59
0
0
22 May 2025
Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders
David Chanin
Tomáš Dulka
Adrià Garriga-Alonso
49
0
0
16 May 2025
Are Sparse Autoencoders Useful for Java Function Bug Detection?
Rui Melo
Claudia Mamede
Andre Catarino
Rui Abreu
Henrique Lopes Cardoso
69
0
0
15 May 2025
Investigating task-specific prompts and sparse autoencoders for activation monitoring
Henk Tillman
Dan Mossing
LLMSV
74
0
0
28 Apr 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku
Declan Campbell
Xuechunzi Bai
Jiayi Geng
Ryan Liu
...
Ilia Sucholutsky
Veniamin Veselovsky
Liyi Zhang
Jian-Qiao Zhu
Thomas L. Griffiths
ELM
126
4
0
17 Mar 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Yuhang Liu
Dong Gong
Erdun Gao
Zhen Zhang
Zhen Zhang
Biwei Huang
Anton van den Hengel
Javen Qinfeng Shi
Javen Qinfeng Shi
381
0
0
12 Mar 2025
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur
Ekdeep Singh Lubana
Thomas Fel
Demba Ba
83
9
0
03 Mar 2025
Sparse Autoencoders Can Interpret Randomly Initialized Transformers
Thomas Heap
Tim Lawson
Lucy Farnik
Laurence Aitchison
49
16
0
29 Jan 2025
Decomposing The Dark Matter of Sparse Autoencoders
Joshua Engels
Logan Riggs
Max Tegmark
LLMSV
86
14
0
18 Oct 2024
Efficient Dictionary Learning with Switch Sparse Autoencoders
Anish Mudide
Joshua Engels
Eric J. Michaud
Max Tegmark
Christian Schroeder de Witt
52
13
0
10 Oct 2024
The Geometry of Concepts: Sparse Autoencoder Feature Structure
Yuxiao Li
Eric J. Michaud
David D. Baek
Joshua Engels
Xiaoqing Sun
Max Tegmark
84
16
0
10 Oct 2024
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
David Chanin
James Wilken-Smith
Tomáš Dulka
Hardik Bhatnagar
Joseph Bloom
Joseph Isaac Bloom
67
33
0
22 Sep 2024
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun
Jordan K. Taylor
Nicholas Goldowsky-Dill
Lee D. Sharkey
50
39
0
17 May 2024
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
80
145
0
22 Apr 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
103
145
0
28 Mar 2024
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
420
2,081
0
31 Dec 2020
Aligning AI With Shared Human Values
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Jingkai Li
D. Song
Jacob Steinhardt
127
548
0
05 Aug 2020
"Going on a vacation" takes longer than "Going for a walk": A Study of Temporal Commonsense Understanding
Ben Zhou
Daniel Khashabi
Qiang Ning
Dan Roth
AIMat
77
196
0
06 Sep 2019
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
203
1,406
0
31 May 2018
XGBoost: A Scalable Tree Boosting System
Tianqi Chen
Carlos Guestrin
556
38,735
0
09 Mar 2016
Linear Algebraic Structure of Word Senses, with Applications to Polysemy
Sanjeev Arora
Yuanzhi Li
Yingyu Liang
Tengyu Ma
Andrej Risteski
73
282
0
14 Jan 2016
1