Circuit Component Reuse Across Tasks in Transformer Language Models

12 October 2023

Papers citing "Circuit Component Reuse Across Tasks in Transformer Language Models"

8 / 58 papers shown

Title
Universal Neurons in GPT2 Language Models Wes Gurnee Theo Horsley Zifan Carl Guo Tara Rezaei Kheirkhah Qinyi Sun Will Hathaway Neel Nanda Dimitris Bertsimas MILM 102 37 0 22 Jan 2024
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models Michael Lan Phillip H. S. Torr Fazl Barez LRM 32 3 0 07 Nov 2023
Language Models Implement Simple Word2Vec-style Vector Arithmetic Jack Merullo Carsten Eickhoff Ellie Pavlick KELM 34 53 0 25 May 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 162 190 0 02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model Michael Hanna Ollie Liu Alexandre Variengien LRM 193 121 0 30 Apr 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 194 266 0 28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 212 497 0 01 Nov 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 250 463 0 24 Sep 2022