Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.11917
Cited By
A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task
19 February 2024
Jannik Brinkmann
Abhay Sheshadri
Victor Levoso
Paul Swoboda
Christian Bartelt
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task"
12 / 12 papers shown
Title
Signatures of human-like processing in Transformer forward passes
Jennifer Hu
Michael A. Lepori
Michael Franke
AI4CE
165
0
0
18 Apr 2025
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
102
0
0
24 Feb 2025
Transformers Use Causal World Models in Maze-Solving Tasks
Alex F Spies
William Edwards
Michael Ivanitskiy
Adrians Skapars
Tilman Rauker
Katsumi Inoue
A. Russo
Murray Shanahan
137
1
0
16 Dec 2024
Interacting Large Language Model Agents. Interpretable Models and Social Learning
Adit Jain
Vikram Krishnamurthy
LLMAG
37
0
0
02 Nov 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
Look Before You Leap: A Universal Emergent Decomposition of Retrieval Tasks in Language Models
Alexandre Variengien
Eric Winsor
LRM
ReLM
76
10
0
13 Dec 2023
Structured World Representations in Maze-Solving Transformers
Michael Ivanitskiy
Alex F Spies
Tilman Rauker
Guillaume Corlouer
Chris Mathwin
...
Rusheb Shah
Dan Valentine
Cecilia G. Diniz Behn
Katsumi Inoue
Samy Wu Fung
60
5
0
05 Dec 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
102
169
0
10 Oct 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
191
261
0
28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
497
0
01 Nov 2022
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought
Abulhair Saparov
He He
ELM
LRM
ReLM
121
277
0
03 Oct 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,077
0
24 May 2022
1