Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.04690
Cited By
Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks
5 July 2024
Aaron Mueller
CML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks"
10 / 10 papers shown
Title
MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller
Atticus Geiger
Sarah Wiegreffe
Dana Arad
Iván Arcuschin
...
Alessandro Stolfo
Martin Tutek
Amir Zur
David Bau
Yonatan Belinkov
48
1
0
17 Apr 2025
AtP*: An efficient and scalable method for localizing LLM behaviour to components
János Kramár
Tom Lieberum
Rohin Shah
Neel Nanda
KELM
45
42
0
01 Mar 2024
Information Flow Routes: Automatically Interpreting Language Models at Scale
Javier Ferrando
Elena Voita
54
35
0
27 Feb 2024
Characterizing Mechanisms for Factual Recall in Language Models
Qinan Yu
Jack Merullo
Ellie Pavlick
KELM
42
23
0
24 Oct 2023
Attribution Patching Outperforms Automated Circuit Discovery
Aaquib Syed
Can Rager
Arthur Conmy
68
56
0
16 Oct 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
189
120
0
30 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
191
261
0
28 Apr 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas F. Icard
Noah D. Goodman
CML
75
98
0
05 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
496
0
01 Nov 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
250
460
0
24 Sep 2022
1