Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.23317
Cited By
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
29 October 2024
Dezhan Tu
Danylo Vashchilenko
Yuzhe Lu
Panpan Xu
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration"
10 / 10 papers shown
Title
METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding
Mengyue Wang
Shuo Chen
Kristian Kersting
Volker Tresp
Yunpu Ma
VLM
75
0
0
03 Jun 2025
EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models
Zekun Wang
Minghua Ma
Zexin Wang
Rongchuan Mu
Liping Shan
Ming Liu
Bing Qin
VLM
49
1
0
31 May 2025
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
Yucheng Li
Huiqiang Jiang
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Jianfeng Gao
Yue Yang
Lili Qiu
123
5
0
22 Apr 2025
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference
Kai Huang
Hao Zou
Bochen Wang
Ye Xi
Zhen Xie
Hao Wang
VLM
107
0
0
31 Mar 2025
Beyond Intermediate States: Explaining Visual Redundancy through Language
Dingchen Yang
Bowen Cao
Anran Zhang
Weibo Gu
Winston Hu
Guang Chen
VLM
140
0
0
26 Mar 2025
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
146
7
0
16 Mar 2025
Slim attention: cut your context memory in half without loss -- K-cache is all you need for MHA
Nils Graef
Matthew Clapp
117
2
0
07 Mar 2025
CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs
Zeliang Zhang
Yifan Zhu
Susan Liang
Zhiyuan Wang
Jiani Liu
...
Mingjie Zhao
Chenliang Xu
Kun Wan
Wentian Zhao
Wentian Zhao
VLM
MQ
135
0
0
15 Feb 2025
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
210
12
0
29 Dec 2024
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
Zefan Cai
Yichi Zhang
Bofei Gao
Yuliang Liu
Yongqian Li
...
Wayne Xiong
Yue Dong
Baobao Chang
Junjie Hu
Wen Xiao
216
117
0
04 Jun 2024
1