Transformers are uninterpretable with myopic methods: a case study with
bounded Dyck grammars

Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars

3 December 2023

Andrej Risteski

Papers citing "Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars"

12 / 12 papers shown

Title
ICLR: In-Context Learning of Representations Core Francisco Park Andrew Lee Ekdeep Singh Lubana Yongyi Yang Maya Okawa Kento Nishi Martin Wattenberg Hidenori Tanaka AIFin 120 3 0 29 Dec 2024
Training Neural Networks as Recognizers of Formal Languages Alexandra Butoi Ghazal Khalighinejad Anej Svete Josef Valvoda Ryan Cotterell Brian DuSell NAI 44 2 0 11 Nov 2024
Analyzing (In)Abilities of SAEs via Formal Languages Abhinav Menon Manish Shrivastava David M. Krueger Ekdeep Singh Lubana 50 7 0 15 Oct 2024
Separations in the Representational Capabilities of Transformers and Recurrent Architectures S. Bhattamishra Michael Hahn Phil Blunsom Varun Kanade GNN 44 9 0 13 Jun 2024
Linear Transformers are Versatile In-Context Learners Max Vladymyrov J. Oswald Mark Sandler Rong Ge 34 13 0 21 Feb 2024
Attention Meets Post-hoc Interpretability: A Mathematical Perspective Gianluigi Lopardo F. Precioso Damien Garreau 16 4 0 05 Feb 2024
Interpretation of Intracardiac Electrograms Through Textual Representations William Jongwon Han Diana Gomez Avi Alok Chaojing Duan Michael A. Rosenberg Douglas Weber Emerson Liu Ding Zhao 26 1 0 02 Feb 2024
Do Transformers Parse while Predicting the Masked Word? Haoyu Zhao A. Panigrahi Rong Ge Sanjeev Arora 76 31 0 14 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding Yuchen Li Yuan-Fang Li Andrej Risteski 120 61 0 07 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 212 496 0 01 Nov 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 250 460 0 24 Sep 2022
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 226 405 0 24 Feb 2021