Attention Score is not All You Need for Token Importance Indicator in KV
Cache Reduction: Value Also Matters

Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters

18 June 2024

Hidetaka Kamigaito

Taro Watanabe

Papers citing "Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters"

7 / 7 papers shown

Title
Using Attention Sinks to Identify and Evaluate Dormant Heads in Pretrained LLMs Pedro Sandoval-Segura Xijun Wang Ashwinee Panda Micah Goldblum Ronen Basri Tom Goldstein David Jacobs 22 0 0 04 Apr 2025
Cognitive Memory in Large Language Models Lianlei Shan Shixian Luo Zezhou Zhu Yu Yuan Yong Wu LLMAG KELM 160 1 0 03 Apr 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models Saeed Ranjbar Alvar Gursimran Singh Mohammad Akbari Yong Zhang VLM 77 0 0 04 Mar 2025
ControlMM: Controllable Masked Motion Generation Ekkasit Pinyoanuntapong Muhammad Usama Saleem Korrawe Karunratanakul Pu Wang Hongfei Xue Cheng Chen Chuan Guo Junli Cao J. Ren Sergey Tulyakov VGen 37 4 0 14 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More Wei Huang Yue Liao Jianhui Liu Ruifei He Haoru Tan Shiming Zhang Hongsheng Li Si Liu Xiaojuan Qi MoE 39 3 0 08 Oct 2024
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering John Yang Carlos E. Jimenez Alexander Wettig K. Lieret Shunyu Yao Karthik Narasimhan Ofir Press LLMAG 103 194 0 06 May 2024
Transkimmer: Transformer Learns to Layer-wise Skim Yue Guan Zhengyi Li Jingwen Leng Zhouhan Lin Minyi Guo 78 38 0 15 May 2022