ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.06764
  4. Cited By
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference
  Acceleration for Large Vision-Language Models

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

11 March 2024
Liang Chen
Haozhe Zhao
Tianyu Liu
Shuai Bai
Junyang Lin
Chang Zhou
Baobao Chang
    MLLM
    VLM
ArXivPDFHTML

Papers citing "An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models"

50 / 103 papers shown
Title
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
89
26
0
31 Dec 2024
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
88
6
0
29 Dec 2024
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Hongyu Chen
Zihan Wang
Xianrui Li
Xingchen Sun
Fangyi Chen
Jiang Liu
Rongxiang Weng
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
114
7
0
14 Dec 2024
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong
Chengyao Wang
Yuqi Liu
Senqiao Yang
Longxiang Tang
...
Shaozuo Yu
Sitong Wu
Eric Lo
Shu Liu
Jiaya Jia
AuLLM
111
7
0
12 Dec 2024
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for
  Accelerating Large VLMs
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Zechao Li
Yibing Song
Kaidi Wang
Zhangyang Wang
Yang You
93
7
0
04 Dec 2024
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and
  Pruning
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong
Zhuoming Liu
Yin Li
Liwei Wang
90
3
0
04 Dec 2024
Explainable and Interpretable Multimodal Large Language Models: A
  Comprehensive Survey
Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey
Yunkai Dang
Kaichen Huang
Jiahao Huo
Yibo Yan
S. Huang
...
Kun Wang
Yong Liu
Jing Shao
Hui Xiong
Xuming Hu
LRM
104
15
0
03 Dec 2024
Enhancing Perception Capabilities of Multimodal LLMs with Training-Free
  Fusion
Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusion
Zhuokun Chen
Jinwu Hu
Zeshuai Deng
Yufeng Wang
Bohan Zhuang
Mingkui Tan
71
0
0
02 Dec 2024
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang
Aosong Cheng
Ming Lu
Zhiyong Zhuo
Minqi Wang
Jiajun Cao
Shaobo Guo
Qi She
Shanghang Zhang
VLM
102
11
0
02 Dec 2024
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
Xubing Ye
Yukang Gan
Yixiao Ge
Xiao Zhang
Yansong Tang
101
7
0
30 Nov 2024
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin Qinghong Lin
Linjie Li
Difei Gao
Zheng Yang
Shiwei Wu
Zechen Bai
Weixian Lei
Lijuan Wang
Mike Zheng Shou
LLMAG
86
15
0
26 Nov 2024
Efficient Multi-modal Large Language Models via Visual Token Grouping
Efficient Multi-modal Large Language Models via Visual Token Grouping
Minbin Huang
Runhui Huang
Han Shi
Yimeng Chen
Chuanyang Zheng
Xiangguo Sun
Xin Jiang
Zhiyu Li
Hong Cheng
VLM
95
3
0
26 Nov 2024
FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual
  Token Compression
FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression
Yuke Zhu
Chi Xie
Shuang Liang
Bo Zheng
Sheng Guo
83
9
0
21 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
101
0
0
20 Nov 2024
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for
  Vision-Language Model Inference Acceleration
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration
Dezhan Tu
Danylo Vashchilenko
Yuzhe Lu
Panpan Xu
VLM
48
9
0
29 Oct 2024
Towards Unifying Understanding and Generation in the Era of Vision
  Foundation Models: A Survey from the Autoregression Perspective
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
54
3
0
29 Oct 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Zeang Sheng
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
53
29
0
22 Oct 2024
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video
  Even in VLMs
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Michael S Ryoo
Honglu Zhou
Shrikant B. Kendre
Can Qin
Le Xue
Manli Shu
Silvio Savarese
Ran Xu
Caiming Xiong
Juan Carlos Niebles
VGen
53
13
0
21 Oct 2024
Efficient Vision-Language Models by Summarizing Visual Tokens into
  Compact Registers
Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers
Yuxin Wen
Qingqing Cao
Qichen Fu
Sachin Mehta
Mahyar Najibi
VLM
25
4
0
17 Oct 2024
Active-Dormant Attention Heads: Mechanistically Demystifying
  Extreme-Token Phenomena in LLMs
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
Tianyu Guo
Druv Pai
Yu Bai
Jiantao Jiao
Michael I. Jordan
Song Mei
34
10
0
17 Oct 2024
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
Chenxi Wang
Xiang Chen
N. Zhang
Bozhong Tian
Haoming Xu
Shumin Deng
Huajun Chen
MLLM
LRM
46
4
0
15 Oct 2024
ControlMM: Controllable Masked Motion Generation
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Chong Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
42
4
0
14 Oct 2024
ZipVL: Efficient Large Vision-Language Models with Dynamic Token
  Sparsification and KV Cache Compression
ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression
Yefei He
Feng Chen
Jing Liu
Wenqi Shao
Hong Zhou
Kaipeng Zhang
Bohan Zhuang
VLM
55
12
0
11 Oct 2024
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Javier Marin
LRM
88
0
0
06 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
89
26
0
04 Oct 2024
AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity
AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity
Zhibin Lan
Liqiang Niu
Fandong Meng
Wenbo Li
Jie Zhou
Jinsong Su
VLM
38
2
0
20 Sep 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths
  Vision Computation
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
59
12
0
29 Aug 2024
Semantic Alignment for Multimodal Large Language Models
Semantic Alignment for Multimodal Large Language Models
Tao Wu
Mengze Li
Jingyuan Chen
Wei Ji
Wang Lin
Jinyang Gao
Kun Kuang
Zhou Zhao
Fei Wu
49
5
0
23 Aug 2024
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
Feipeng Ma
Yizhou Zhou
Hebei Li
Zilong He
Siying Wu
Fengyun Rao
Siying Wu
Fengyun Rao
Yueyi Zhang
Xiaoyan Sun
41
4
0
21 Aug 2024
Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language
  Models
Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Mingxin Huang
Yuliang Liu
Dingkang Liang
Lianwen Jin
Xiang Bai
58
10
0
04 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLM
MLLM
68
19
0
04 Aug 2024
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large
  Language Models
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models
Ming-Kuan Wu
Xinyue Cai
Jiayi Ji
Jiale Li
Oucheng Huang
Gen Luo
Hao Fei
Xiaoshuai Sun
Rongrong Ji
MLLM
59
7
0
31 Jul 2024
LLAVADI: What Matters For Multimodal Large Language Models Distillation
LLAVADI: What Matters For Multimodal Large Language Models Distillation
Shilin Xu
Xiangtai Li
Haobo Yuan
Lu Qi
Yunhai Tong
Ming-Hsuan Yang
36
3
0
28 Jul 2024
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of
  Multimodal Models
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models
Pengxiang Li
Zhi Gao
Bofei Zhang
Tao Yuan
Yuwei Wu
Mehrtash Harandi
Yunde Jia
Song-Chun Zhu
Qing Li
VLM
MLLM
48
5
0
16 Jul 2024
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation
Jinsheng Huang
Liang Chen
Taian Guo
Fu Zeng
Yusheng Zhao
...
Wei Ju
Luchen Liu
Tianyu Liu
Baobao Chang
Ming Zhang
46
5
0
29 Jun 2024
LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context
  Compression
LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression
Jieneng Chen
Luoxin Ye
Ju He
Zhao-Yang Wang
Daniel Khashabi
Alan Yuille
VLM
27
5
0
28 Jun 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding
  with Efficient Visual Slimming
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
42
15
0
27 Jun 2024
Long Context Transfer from Language to Vision
Long Context Transfer from Language to Vision
Peiyuan Zhang
Kaichen Zhang
Bo Li
Guangtao Zeng
Jingkang Yang
Yuanhan Zhang
Ziyue Wang
Haoran Tan
Chunyuan Li
Ziwei Liu
VLM
72
146
0
24 Jun 2024
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
Tianren Ma
Lingxi Xie
Yunjie Tian
Boyu Yang
Yuan Zhang
44
0
0
17 Jun 2024
From Redundancy to Relevance: Enhancing Explainability in Multimodal
  Large Language Models
From Redundancy to Relevance: Enhancing Explainability in Multimodal Large Language Models
Xiaofeng Zhang
Chen Shen
Xiaosong Yuan
Shaotian Yan
Liang Xie
Wenxiao Wang
Chaochen Gu
Hao Tang
Jieping Ye
54
2
0
04 Jun 2024
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling
Zefan Cai
Yichi Zhang
Bofei Gao
Yuliang Liu
Yong Li
...
Wayne Xiong
Yue Dong
Baobao Chang
Junjie Hu
Wen Xiao
75
86
0
04 Jun 2024
Visual Anchors Are Strong Information Aggregators For Multimodal Large
  Language Model
Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model
Haogeng Liu
Quanzeng You
Xiaotian Han
Yongfei Liu
Huaibo Huang
Ran He
Hongxia Yang
33
2
0
28 May 2024
Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning
Empowering Source-Free Domain Adaptation via MLLM-Guided Reliability-Based Curriculum Learning
Dongjie Chen
Kartik Patwari
Zhengfeng Lai
S. Cheung
Chen-Nee Chuah
Chen-Nee Chuah
53
0
0
28 May 2024
Matryoshka Multimodal Models
Matryoshka Multimodal Models
Mu Cai
Jianwei Yang
Jianfeng Gao
Yong Jae Lee
VLM
60
28
0
27 May 2024
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal
  Models
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Chunjiang Ge
Sijie Cheng
Ziming Wang
Jiale Yuan
Yuan Gao
Jun Song
Shiji Song
Gao Huang
Bo Zheng
MLLM
VLM
36
17
0
24 May 2024
MiniCache: KV Cache Compression in Depth Dimension for Large Language
  Models
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
Akide Liu
Jing Liu
Zizheng Pan
Yefei He
Gholamreza Haffari
Bohan Zhuang
MQ
35
30
0
23 May 2024
Efficient Multimodal Large Language Models: A Survey
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
47
47
0
17 May 2024
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient
  Task Adaptation
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Yizhe Xiong
Hui Chen
Tianxiang Hao
Zijia Lin
Jungong Han
Yuesong Zhang
Guoxin Wang
Yongjun Bao
Guiguang Ding
51
17
0
14 Mar 2024
Silkie: Preference Distillation for Large Visual Language Models
Silkie: Preference Distillation for Large Visual Language Models
Lei Li
Zhihui Xie
Mukai Li
Shunian Chen
Peiyi Wang
Liang Chen
Yazheng Yang
Benyou Wang
Lingpeng Kong
MLLM
117
69
0
17 Dec 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
209
603
0
16 Nov 2023
Previous
123
Next