Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.01527
Cited By
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
1 July 2024
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
Songchen Li
Guanchu Wang
Duy Le
Hongye Jin
V. Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches"
19 / 19 papers shown
Title
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
Piotr Nawrot
Robert Li
Renjie Huang
Sebastian Ruder
Kelly Marchisio
E. Ponti
39
0
0
24 Apr 2025
LagKV: Lag-Relative Information of the KV Cache Tells Which Tokens Are Important
Manlai Liang
JiaMing Zhang
Xiong Li
Jinlong Li
MQ
35
0
0
07 Apr 2025
SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs
Shibo Jie
Yehui Tang
Kai Han
Zhi-Hong Deng
Jing Han
100
0
0
20 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRL
ReLM
LRM
83
31
0
20 Mar 2025
Cost-Optimal Grouped-Query Attention for Long-Context Modeling
Yuxiao Chen
Yutong Wu
Chenyang Song
Zhiyuan Liu
Maosong Sun
Xu Han
Zhiyuan Liu
Maosong Sun
64
0
0
12 Mar 2025
Dialogue Without Limits: Constant-Sized KV Caches for Extended Responses in LLMs
Ravi Ghadia
Avinash Kumar
Gaurav Jain
Prashant J. Nair
Poulami Das
46
1
0
02 Mar 2025
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang
Xiang Liu
Qian Wang
Peijie Dong
Bingsheng He
Xiaowen Chu
Bo Li
LRM
61
1
0
24 Feb 2025
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
Xiang Liu
Zhenheng Tang
Hong Chen
Peijie Dong
Zeyu Li
Xiuze Zhou
Bo Li
Xuming Hu
Xiaowen Chu
185
3
0
04 Feb 2025
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format
Chao Fang
Man Shi
Robin Geens
Arne Symons
Zhongfeng Wang
Marian Verhelst
74
0
0
24 Nov 2024
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
84
1
0
02 Oct 2024
ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency
Yuhang Yao
Han Jin
Alay Dilipbhai Shah
Shanshan Han
Zijian Hu
Yide Ran
Dimitris Stripelis
Zhaozhuo Xu
Salman Avestimehr
Chang D. Yoo
34
1
0
23 Jul 2024
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead
A. Zandieh
Majid Daliri
Insu Han
MQ
43
12
0
05 Jun 2024
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li
Yingbing Huang
Bowen Yang
Bharat Venkitesh
Acyr Locatelli
Hanchen Ye
Tianle Cai
Patrick Lewis
Deming Chen
VLM
79
157
0
22 Apr 2024
HGRN2: Gated Linear RNNs with State Expansion
Zhen Qin
Aaron Courville
Weixuan Sun
Xuyang Shen
Dong Li
Weigao Sun
Yiran Zhong
LRM
47
47
0
11 Apr 2024
QAQ: Quality Adaptive Quantization for LLM KV Cache
Shichen Dong
Wenfang Cheng
Jiayu Qin
Wei Wang
MQ
46
34
0
07 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
61
117
0
29 Feb 2024
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
J. Yang
Byeongwook Kim
Jeongin Bae
Beomseok Kwon
Gunho Park
Eunho Yang
S. Kwon
Dongsoo Lee
MQ
39
45
0
28 Feb 2024
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Zirui Liu
Jiayi Yuan
Hongye Jin
Shaochen Zhong
Zhaozhuo Xu
Vladimir Braverman
Beidi Chen
Xia Hu
MQ
36
161
0
05 Feb 2024
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
Ying Sheng
Lianmin Zheng
Binhang Yuan
Zhuohan Li
Max Ryabinin
...
Joseph E. Gonzalez
Percy Liang
Christopher Ré
Ion Stoica
Ce Zhang
149
369
0
13 Mar 2023
1