Copy Suppression: Comprehensively Understanding an Attention Head

Copy Suppression: Comprehensively Understanding an Attention Head

6 October 2023

Callum McDougall

ArXiv (abs)PDF HTML

Papers citing "Copy Suppression: Comprehensively Understanding an Attention Head"

16 / 16 papers shown

Title
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking Wuwei Zhang Fangcong Yin Howard Yen Danqi Chen Xi Ye LRM 76 0 0 11 Jun 2025
Bridging Neural ODE and ResNet: A Formal Error Bound for Safety Verification Abdelrahman Sayed Sayed Pierre-Jean Meyer Mohamed Ghazel 26 0 0 03 Jun 2025
Do Language Models Use Their Depth Efficiently? Róbert Csordás Christopher D. Manning Christopher Potts 208 2 0 20 May 2025
Taming Knowledge Conflicts in Language Models Gaotang Li Yuzhong Chen Hanghang Tong KELM 86 2 0 14 Mar 2025
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs Oskar van der Wal Pietro Lesci Max Muller-Eberstein Naomi Saphra Hailey Schoelkopf Willem H. Zuidema Stella Biderman LRM 108 2 0 12 Mar 2025
Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking Yifan Zhang Wenyu Du Dongming Jin Jie Fu Zhi Jin LRM 134 2 0 27 Feb 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries Tianyi Lorena Yan Robin Jia KELM MU 98 0 0 27 Feb 2025
Neuroplasticity and Corruption in Model Mechanisms: A Case Study Of Indirect Object Identification Vishnu Kabir Chhabra Ding Zhu Mohammad Mahdi Khalili 99 2 0 27 Feb 2025
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis Xiang Wang Yan Hu Wenyu Du Reynold Cheng Benyou Wang Difan Zou 147 3 0 17 Feb 2025
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics Yaniv Nikankin Anja Reusch Aaron Mueller Yonatan Belinkov AIFin LRM 129 33 0 28 Oct 2024
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability Zhongxiang Sun Xiaoxue Zang Kai Zheng Yang Song Jun Xu Xiao Zhang Weijie Yu Yang Song Han Li 124 17 0 15 Oct 2024
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience Zhonghao He Jascha Achterberg Katie Collins Kevin K. Nejad Danyal Akarca ... Chole Li Kai J. Sandbrink Stephen Casper Anna Ivanova Grace W. Lindsay AI4CE 97 2 0 22 Aug 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 190 33 0 02 Jul 2024
The Remarkable Robustness of LLMs: Stages of Inference? Vedang Lad Wes Gurnee Max Tegmark Max Tegmark 115 53 0 27 Jun 2024
Successor Heads: Recurring, Interpretable Attention Heads In The Wild Rhys Gould Euan Ong George Ogden Arthur Conmy LRM 44 52 0 14 Dec 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability Arthur Conmy Augustine N. Mavor-Parker Aengus Lynch Stefan Heimersheim Adrià Garriga-Alonso 68 319 0 28 Apr 2023