VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers

22 May 2023

Papers citing "VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers"

7 / 7 papers shown

Title
Relation Also Knows: Rethinking the Recall and Editing of Factual Associations in Auto-Regressive Transformer Language Models Xiyu Liu Zhengxiao Liu Naibin Gu Zheng-Shen Lin Wanli Ma Ji Xiang Weiping Wang KELM 49 0 0 27 Aug 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs Nitay Calderon Roi Reichart 42 10 0 27 Jul 2024
Finding Transformer Circuits with Edge Pruning Adithya Bhaskar Alexander Wettig Dan Friedman Danqi Chen 68 17 0 24 Jun 2024
Knowledge Circuits in Pretrained Transformers Yunzhi Yao Ningyu Zhang Zekun Xi Meng Wang Ziwen Xu Shumin Deng Huajun Chen KELM 69 20 0 28 May 2024
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 212 497 0 01 Nov 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency Giovanni Puccetti Anna Rogers Aleksandr Drozd F. Dell’Orletta 79 42 0 23 May 2022
All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality William Timkey Marten van Schijndel 224 111 0 09 Sep 2021