Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.02069
Cited By
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
4 June 2024
Zefan Cai
Yichi Zhang
Bofei Gao
Yuliang Liu
Yong Li
Keming Lu
Wayne Xiong
Yue Dong
Baobao Chang
Junjie Hu
Wen Xiao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling"
16 / 66 papers shown
Title
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
Dezhan Tu
Danylo Vashchilenko
Yuzhe Lu
Panpan Xu
VLM
48
9
0
29 Oct 2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun
Li-Wen Chang
Yiyuan Ma
Wenlei Bao
Ningxin Zheng
Xin Liu
Harry Dong
Yuejie Chi
Beidi Chen
VLM
88
16
0
28 Oct 2024
KV Prediction for Improved Time to First Token
Maxwell Horton
Qingqing Cao
Chenfan Sun
Yanzi Jin
Sachin Mehta
Mohammad Rastegari
Moin Nabi
AI4TS
39
1
0
10 Oct 2024
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding
Zilin Xiao
Hongming Zhang
Tao Ge
Siru Ouyang
Vicente Ordonez
Dong Yu
39
5
0
08 Oct 2024
LoRC: Low-Rank Compression for LLMs KV Cache with a Progressive Compression Strategy
Rongzhi Zhang
Kuang Wang
Liyuan Liu
Shuohang Wang
Hao Cheng
Chao Zhang
Yelong Shen
MQ
26
5
0
04 Oct 2024
Locret: Enhancing Eviction in Long-Context LLM Inference with Trained Retaining Heads on Consumer-Grade Devices
Yuxiang Huang
Binhang Yuan
Xu Han
Chaojun Xiao
Zhiyuan Liu
RALM
84
1
0
02 Oct 2024
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
Suyu Ge
Xihui Lin
Yunan Zhang
Jiawei Han
Hao Peng
33
4
0
02 Oct 2024
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Yao Teng
Han Shi
Xian Liu
Xuefei Ning
Guohao Dai
Yu Wang
Zhenguo Li
Xihui Liu
58
10
0
02 Oct 2024
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization
Yifan Tan
Haoze Wang
Chao Yan
Yangdong Deng
MQ
31
2
0
25 Sep 2024
Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint
Sixiang Chen
Tian-Chun Ye
Kaicheng Zhang
Zhaohu Xing
Yunlong Lin
Lei Zhu
DiffM
46
9
0
24 Sep 2024
Cross-layer Attention Sharing for Large Language Models
Yongyu Mu
Yuzhang Wu
Yuchun Fan
Chenglong Wang
Hengyu Li
Qiaozhi He
Murun Yang
Tong Xiao
Jingbo Zhu
42
5
0
04 Aug 2024
ThinK: Thinner Key Cache by Query-Driven Pruning
Yuhui Xu
Zhanming Jie
Hanze Dong
Lei Wang
Xudong Lu
Aojun Zhou
Amrita Saha
Caiming Xiong
Doyen Sahoo
75
14
0
30 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
42
43
0
09 Jul 2024
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li
Yingbing Huang
Bowen Yang
Bharat Venkitesh
Acyr Locatelli
Hanchen Ye
Tianle Cai
Patrick Lewis
Deming Chen
VLM
79
157
0
22 Apr 2024
Massive Activations in Large Language Models
Mingjie Sun
Xinlei Chen
J. Zico Kolter
Zhuang Liu
74
68
0
27 Feb 2024
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Dawei Zhu
Nan Yang
Liang Wang
Yifan Song
Wenhao Wu
Furu Wei
Sujian Li
76
78
0
19 Sep 2023
Previous
1
2