Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.04185
Cited By
Residual Stream Analysis with Multi-Layer SAEs
6 September 2024
Tim Lawson
Lucy Farnik
Conor Houghton
Laurence Aitchison
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Residual Stream Analysis with Multi-Layer SAEs"
6 / 6 papers shown
Title
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik
Tim Lawson
Conor Houghton
Laurence Aitchison
61
0
0
25 Feb 2025
Transformer Dynamics: A neuroscientific approach to interpretability of large language models
Jesseba Fernando
Grigori Guitchounts
AI4CE
41
0
0
17 Feb 2025
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
67
10
0
18 Nov 2024
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
497
0
01 Nov 2022
Disentanglement with Biological Constraints: A Theory of Functional Cell Types
James C. R. Whittington
W. Dorrell
Surya Ganguli
Timothy Edward John Behrens
47
48
0
30 Sep 2022
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
131
322
0
21 Sep 2022
1