Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.23317
Cited By
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
29 October 2024
Dezhan Tu
Danylo Vashchilenko
Yuzhe Lu
Panpan Xu
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration"
7 / 7 papers shown
Title
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
Yucheng Li
Huiqiang Jiang
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Jianfeng Gao
Yi Yang
Lili Qiu
33
1
0
22 Apr 2025
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference
Kai Huang
Hao Zou
Bochen Wang
Ye Xi
Zhen Xie
Hao Wang
VLM
49
0
0
31 Mar 2025
Beyond Intermediate States: Explaining Visual Redundancy through Language
Dingchen Yang
Bowen Cao
Anran Zhang
Weibo Gu
Winston Hu
Guang Chen
VLM
79
0
0
26 Mar 2025
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
83
3
0
16 Mar 2025
Slim attention: cut your context memory in half without loss of accuracy -- K-cache is all you need for MHA
Nils Graef
Andrew Wasielewski
40
1
0
07 Mar 2025
CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs
Zeliang Zhang
Yifan Zhu
Susan Liang
Zhiyuan Wang
Jiani Liu
...
Mingjie Zhao
Chenliang Xu
Kun Wan
Wentian Zhao
Wentian Zhao
VLM
MQ
43
0
0
15 Feb 2025
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
82
6
0
29 Dec 2024
1