ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.03889
17
0

Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs

4 April 2025
Pedro Sandoval-Segura
Xijun Wang
Ashwinee Panda
Micah Goldblum
Ronen Basri
Tom Goldstein
David Jacobs
ArXivPDFHTML
Abstract

Multi-head attention is foundational to large language models (LLMs), enabling different heads to have diverse focus on relevant input tokens. However, learned behaviors like attention sinks, where the first token receives most attention despite limited semantic importance, challenge our understanding of multi-head attention. To analyze this phenomenon, we propose a new definition for attention heads dominated by attention sinks, known as dormant attention heads. We compare our definition to prior work in a model intervention study where we test whether dormant heads matter for inference by zeroing out the output of dormant attention heads. Using six pretrained models and five benchmark datasets, we find our definition to be more model and dataset-agnostic. Using our definition on most models, more than 4% of a model's attention heads can be zeroed while maintaining average accuracy, and zeroing more than 14% of a model's attention heads can keep accuracy to within 1% of the pretrained model's average accuracy. Further analysis reveals that dormant heads emerge early in pretraining and can transition between dormant and active states during pretraining. Additionally, we provide evidence that they depend on characteristics of the input text.

View on arXiv
@article{sandoval-segura2025_2504.03889,
  title={ Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs },
  author={ Pedro Sandoval-Segura and Xijun Wang and Ashwinee Panda and Micah Goldblum and Ronen Basri and Tom Goldstein and David Jacobs },
  journal={arXiv preprint arXiv:2504.03889},
  year={ 2025 }
}
Comments on this paper