ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.04625
  4. Cited By
Copy Suppression: Comprehensively Understanding an Attention Head

Copy Suppression: Comprehensively Understanding an Attention Head

6 October 2023
Callum McDougall
Arthur Conmy
Cody Rushing
Thomas McGrath
Neel Nanda
    MILM
ArXiv (abs)PDFHTML

Papers citing "Copy Suppression: Comprehensively Understanding an Attention Head"

16 / 16 papers shown
Title
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking
Wuwei Zhang
Fangcong Yin
Howard Yen
Danqi Chen
Xi Ye
LRM
76
0
0
11 Jun 2025
Bridging Neural ODE and ResNet: A Formal Error Bound for Safety Verification
Bridging Neural ODE and ResNet: A Formal Error Bound for Safety Verification
Abdelrahman Sayed Sayed
Pierre-Jean Meyer
Mohamed Ghazel
26
0
0
03 Jun 2025
Do Language Models Use Their Depth Efficiently?
Do Language Models Use Their Depth Efficiently?
Róbert Csordás
Christopher D. Manning
Christopher Potts
208
2
0
20 May 2025
Taming Knowledge Conflicts in Language Models
Taming Knowledge Conflicts in Language Models
Gaotang Li
Yuzhong Chen
Hanghang Tong
KELM
86
2
0
14 Mar 2025
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Oskar van der Wal
Pietro Lesci
Max Muller-Eberstein
Naomi Saphra
Hailey Schoelkopf
Willem H. Zuidema
Stella Biderman
LRM
108
2
0
12 Mar 2025
Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking
Finite State Automata Inside Transformers with Chain-of-Thought: A Mechanistic Study on State Tracking
Yifan Zhang
Wenyu Du
Dongming Jin
Jie Fu
Zhi Jin
LRM
134
2
0
27 Feb 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Tianyi Lorena Yan
Robin Jia
KELMMU
98
0
0
27 Feb 2025
Neuroplasticity and Corruption in Model Mechanisms: A Case Study Of Indirect Object Identification
Vishnu Kabir Chhabra
Ding Zhu
Mohammad Mahdi Khalili
99
2
0
27 Feb 2025
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Xiang Wang
Yan Hu
Wenyu Du
Reynold Cheng
Benyou Wang
Difan Zou
147
3
0
17 Feb 2025
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Yaniv Nikankin
Anja Reusch
Aaron Mueller
Yonatan Belinkov
AIFinLRM
129
33
0
28 Oct 2024
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic Interpretability
Zhongxiang Sun
Xiaoxue Zang
Kai Zheng
Yang Song
Jun Xu
Xiao Zhang
Weijie Yu
Yang Song
Han Li
124
17
0
15 Oct 2024
Multilevel Interpretability Of Artificial Neural Networks: Leveraging
  Framework And Methods From Neuroscience
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Zhonghao He
Jascha Achterberg
Katie Collins
Kevin K. Nejad
Danyal Akarca
...
Chole Li
Kai J. Sandbrink
Stephen Casper
Anna Ivanova
Grace W. Lindsay
AI4CE
97
2
0
22 Aug 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
190
33
0
02 Jul 2024
The Remarkable Robustness of LLMs: Stages of Inference?
The Remarkable Robustness of LLMs: Stages of Inference?
Vedang Lad
Wes Gurnee
Max Tegmark
Max Tegmark
115
53
0
27 Jun 2024
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
Rhys Gould
Euan Ong
George Ogden
Arthur Conmy
LRM
44
52
0
14 Dec 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
68
319
0
28 Apr 2023
1