ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.08889
21
0

SeerAttention-R: Sparse Attention Adaptation for Long Reasoning

10 June 2025
Yizhao Gao
Shuming Guo
Shijie Cao
Yuqing Xia
Yu Cheng
Lei Wang
Lingxiao Ma
Yutao Sun
Tianzhu Ye
Li Dong
Hayden Kwok-Hay So
Yu Hua
Ting Cao
Fan Yang
Mao Yang
    VLMLRM
ArXiv (abs)PDFHTML
Abstract

We introduce SeerAttention-R, a sparse attention framework specifically tailored for the long decoding of reasoning models. Extended from SeerAttention, SeerAttention-R retains the design of learning attention sparsity through a self-distilled gating mechanism, while removing query pooling to accommodate auto-regressive decoding. With a lightweight plug-in gating, SeerAttention-R is flexible and can be easily integrated into existing pretrained model without modifying the original parameters. We demonstrate that SeerAttention-R, trained on just 0.4B tokens, maintains near-lossless reasoning accuracy with 4K token budget in AIME benchmark under large sparse attention block sizes (64/128). Using TileLang, we develop a highly optimized sparse decoding kernel that achieves near-theoretical speedups of up to 9x over FlashAttention-3 on H100 GPU at 90% sparsity. Code is available at:this https URL.

View on arXiv
@article{gao2025_2506.08889,
  title={ SeerAttention-R: Sparse Attention Adaptation for Long Reasoning },
  author={ Yizhao Gao and Shuming Guo and Shijie Cao and Yuqing Xia and Yu Cheng and Lei Wang and Lingxiao Ma and Yutao Sun and Tianzhu Ye and Li Dong and Hayden Kwok-Hay So and Yu Hua and Ting Cao and Fan Yang and Mao Yang },
  journal={arXiv preprint arXiv:2506.08889},
  year={ 2025 }
}
Main:12 Pages
10 Figures
Bibliography:6 Pages
2 Tables
Comments on this paper