ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.23520
  4. Cited By
AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity

AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity

29 May 2025
Yu Zhang
Dong Guo
Fang Wu
Guoliang Zhu
Dian Ding
Yiming Zhang
ArXivPDFHTML

Papers citing "AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity"

16 / 16 papers shown
Title
XAttention: Block Sparse Attention with Antidiagonal Scoring
XAttention: Block Sparse Attention with Antidiagonal Scoring
Ruyi Xu
Guangxuan Xiao
Haofeng Huang
Junxian Guo
Enze Xie
113
11
0
20 Mar 2025
Predicting Team Performance from Communications in Simulated Search-and-Rescue
Ali Jalal-Kamali
Nikolos Gurney
David Pynadath
AI4TS
165
0
0
05 Mar 2025
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu
Meng Chen
Baotong Lu
Huiqiang Jiang
Zhenhua Han
...
Kai Zhang
Chong Chen
Fan Yang
Yue Yang
Lili Qiu
104
42
0
03 Jan 2025
Not All Heads Matter: A Head-Level KV Cache Compression Method with
  Integrated Retrieval and Reasoning
Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning
Yu Fu
Zefan Cai
Abedelkadir Asi
Wayne Xiong
Yue Dong
Wen Xiao
64
23
0
25 Oct 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
  Dynamic Sparse Attention
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
131
110
0
02 Jul 2024
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM
  Inference
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference
Dongjie Yang
Xiaodong Han
Yan Gao
Yao Hu
Shilin Zhang
Hai Zhao
64
61
0
21 May 2024
SnapKV: LLM Knows What You are Looking for Before Generation
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li
Yingbing Huang
Bowen Yang
Bharat Venkitesh
Acyr Locatelli
Hanchen Ye
Tianle Cai
Patrick Lewis
Deming Chen
VLM
117
192
0
22 Apr 2024
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and
  DeepSpeed-Inference
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
Connor Holmes
Masahiro Tanaka
Michael Wyatt
A. A. Awan
Jeff Rasley
...
Reza Yazdani Aminabadi
Heyang Qin
Arash Bakhtiari
Lev Kurilenko
Yuxiong He
51
69
0
09 Jan 2024
Ring Attention with Blockwise Transformers for Near-Infinite Context
Ring Attention with Blockwise Transformers for Near-Infinite Context
Hao Liu
Matei A. Zaharia
Pieter Abbeel
89
241
0
03 Oct 2023
Efficient Streaming Language Models with Attention Sinks
Efficient Streaming Language Models with Attention Sinks
Michel Lang
Yuandong Tian
Beidi Chen
Song Han
Mike Lewis
AI4TS
RALM
119
750
0
29 Sep 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
163
2,197
0
12 Sep 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context
  Understanding
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Yushi Bai
Xin Lv
Jiajie Zhang
Hong Lyu
Jiankai Tang
...
Aohan Zeng
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
LLMAG
RALM
84
583
0
28 Aug 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.5K
13,182
0
27 Feb 2023
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
224
2,214
0
27 May 2022
Generating Long Sequences with Sparse Transformers
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
115
1,896
0
23 Apr 2019
Online normalizer calculation for softmax
Online normalizer calculation for softmax
Maxim Milakov
N. Gimelshein
72
91
0
08 May 2018
1