Analyzing Transformers in Embedding Space

Analyzing Transformers in Embedding Space

6 September 2022

Jonathan Berant

Papers citing "Analyzing Transformers in Embedding Space"

16 / 16 papers shown

Title
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models Tyler A. Chang Benjamin Bergen 50 0 0 21 Apr 2025
The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis Ge Lei Samuel J. Cooper KELM 49 0 0 15 Feb 2025
Understanding Multimodal LLMs: the Mechanistic Interpretability of Llava in Visual Question Answering Zeping Yu Sophia Ananiadou 136 0 0 17 Nov 2024
From Tokens to Words: On the Inner Lexicon of LLMs Guy Kaplan Matanel Oren Yuval Reif Roy Schwartz 48 12 0 08 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 82 19 0 02 Jul 2024
REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space Tomer Ashuach Martin Tutek Yonatan Belinkov KELM MU 71 4 0 13 Jun 2024
Dissecting Query-Key Interaction in Vision Transformers Xu Pan Aaron Philip Ziqian Xie Odelia Schwartz 39 1 0 04 Apr 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models Asma Ghandeharioun Avi Caciularu Adam Pearce Lucas Dixon Mor Geva 34 87 0 11 Jan 2024
Why bother with geometry? On the relevance of linear decompositions of Transformer embeddings Timothee Mickus Raúl Vázquez 25 2 0 10 Oct 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers Anna Langedijk Hosein Mohebbi Gabriele Sarti Willem H. Zuidema Jaap Jumelet 32 10 0 05 Oct 2023
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods Fred Zhang Neel Nanda LLMSV 36 97 0 27 Sep 2023
Explaining How Transformers Use Context to Build Predictions Javier Ferrando Gerard I. Gállego Ioannis Tsiamas Marta R. Costa-jussá 32 31 0 21 May 2023
Eliciting Latent Predictions from Transformers with the Tuned Lens Nora Belrose Zach Furman Logan Smith Danny Halawi Igor V. Ostrovsky Lev McKinney Stella Biderman Jacob Steinhardt 22 193 0 14 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 212 496 0 01 Nov 2022
Mass-Editing Memory in a Transformer Kevin Meng Arnab Sen Sharma A. Andonian Yonatan Belinkov David Bau KELM VLM 35 525 0 13 Oct 2022
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives Elena Voita Rico Sennrich Ivan Titov 198 181 0 03 Sep 2019