ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.18139
  4. Cited By
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal
  Long-Context Inference

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference

26 June 2024
Zhongwei Wan
Ziang Wu
Che Liu
Jinfa Huang
Zhihong Zhu
Peng Jin
Longyue Wang
Li Yuan
    VLM
ArXivPDFHTML

Papers citing "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference"

26 / 26 papers shown
Title
Static or Dynamic: Towards Query-Adaptive Token Selection for Video Question Answering
Static or Dynamic: Towards Query-Adaptive Token Selection for Video Question Answering
Yumeng Shi
Quanyu Long
Wenya Wang
66
0
0
30 Apr 2025
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features
Jewon Lee
Ki-Ung Song
Seungmin Yang
Donguk Lim
Jaeyeon Kim
Wooksu Shin
Bo-Kyeong Kim
Yong Jae Lee
Tae-Ho Kim
VLM
55
0
0
01 Apr 2025
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models
Keda Tao
Haoxuan You
Yang Sui
Can Qin
Haoyu Wang
VLM
MQ
91
0
0
20 Mar 2025
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
86
3
0
16 Mar 2025
LLaVA-MLB: Mitigating and Leveraging Attention Bias for Training-Free Video LLMs
Leqi Shen
Tao He
Guoqiang Gong
Fan Yang
Yuhui Zhang
Pengzhang Liu
Sicheng Zhao
Guiguang Ding
50
0
0
14 Mar 2025
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
Bozhi Luan
Wengang Zhou
Hao Feng
Zhe Wang
Xiaosong Li
Yiming Li
VLM
65
0
0
11 Mar 2025
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
Zhongwei Wan
H. Shen
Xin Wang
Junfeng Fang
Zheda Mai
Hao Fei
VLM
65
3
0
24 Feb 2025
CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs
CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs
Zeliang Zhang
Yifan Zhu
Susan Liang
Zhiyuan Wang
Jiani Liu
...
Mingjie Zhao
Chenliang Xu
Kun Wan
Wentian Zhao
Wentian Zhao
VLM
MQ
43
0
0
15 Feb 2025
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference
MPCache: MPC-Friendly KV Cache Eviction for Efficient Private Large Language Model Inference
Wenxuan Zeng
Ye Dong
Jinjin Zhou
Junming Ma
Jin Tan
Runsheng Wang
Meng Li
49
0
0
12 Jan 2025
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
82
6
0
29 Dec 2024
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video
  Even in VLMs
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Michael S Ryoo
Honglu Zhou
Shrikant B. Kendre
Can Qin
Le Xue
Manli Shu
Silvio Savarese
Ran Xu
Caiming Xiong
Juan Carlos Niebles
VGen
46
13
0
21 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
31
13
0
15 Oct 2024
ControlMM: Controllable Masked Motion Generation
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Chong Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
37
4
0
14 Oct 2024
ZipVL: Efficient Large Vision-Language Models with Dynamic Token
  Sparsification and KV Cache Compression
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression
Yefei He
Feng Chen
Jing Liu
Wenqi Shao
Hong Zhou
Kaipeng Zhang
Bohan Zhuang
VLM
52
11
0
11 Oct 2024
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech
  Language Model
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Yichen Lu
Jiaqi Song
Chao-Han Huck Yang
Shinji Watanabe
28
0
0
03 Oct 2024
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs
  with 1000x Input Token Reduction
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction
Zhenmei Shi
Yifei Ming
Xuan-Phi Nguyen
Yingyu Liang
Shafiq Joty
81
28
0
25 Sep 2024
Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion
Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion
Hui Shen
Zhongwei Wan
Xin Wang
Mi Zhang
Mamba
32
6
0
15 Sep 2024
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via
  Dynamic Sparse Attention
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention
Huiqiang Jiang
Yucheng Li
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
L. Qiu
72
84
0
02 Jul 2024
Focus on the Core: Efficient Attention via Pruned Token Compression for
  Document Classification
Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
Jungmin Yun
Mihyeon Kim
Youngbin Kim
77
9
0
03 Jun 2024
MileBench: Benchmarking MLLMs in Long Context
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
78
34
0
29 Apr 2024
SnapKV: LLM Knows What You are Looking for Before Generation
SnapKV: LLM Knows What You are Looking for Before Generation
Yuhong Li
Yingbing Huang
Bowen Yang
Bharat Venkitesh
Acyr Locatelli
Hanchen Ye
Tianle Cai
Patrick Lewis
Deming Chen
VLM
79
157
0
22 Apr 2024
Zero-Shot ECG Classification with Multimodal Learning and Test-time
  Clinical Knowledge Enhancement
Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement
Che Liu
Zhongwei Wan
Ouyang Cheng
Anand Shah
Wenjia Bai
Rossella Arcucci
42
29
0
11 Mar 2024
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for
  Accelerating Vision-Language Transformer
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
Jianjian Cao
Peng Ye
Shengze Li
Chong Yu
Yansong Tang
Jiwen Lu
Tao Chen
38
16
0
05 Mar 2024
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware
  Mixed Precision Quantization
No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization
J. Yang
Byeongwook Kim
Jeongin Bae
Beomseok Kwon
Gunho Park
Eunho Yang
S. Kwon
Dongsoo Lee
MQ
42
45
0
28 Feb 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
176
943
0
21 Dec 2023
Self-consistent Reasoning For Solving Math Word Problems
Self-consistent Reasoning For Solving Math Word Problems
Jing Xiong
Zhongwei Wan
Xiping Hu
Min Yang
Chengming Li
ReLM
LRM
54
11
0
27 Oct 2022
1