Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.15388
Cited By
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
22 March 2024
Yuzhang Shang
Mu Cai
Bingxin Xu
Yong Jae Lee
Yan Yan
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models"
39 / 89 papers shown
Title
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy
Te Yang
Jian Jia
Xiangyu Zhu
Weisong Zhao
Bo Wang
...
Shengyuan Liu
Quan Chen
Peng Jiang
Kun Gai
Zhen Lei
66
1
0
23 Nov 2024
freePruner: A Training-free Approach for Large Multimodal Model Acceleration
Bingxin Xu
Yuzhang Shang
Yunhao Ge
Qian Lou
Yan Yan
97
3
0
23 Nov 2024
FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression
Yuke Zhu
Chi Xie
Shuang Liang
Bo Zheng
Sheng Guo
83
9
0
21 Nov 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Zeang Sheng
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
50
28
0
22 Oct 2024
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Michael S Ryoo
Honglu Zhou
Shrikant B. Kendre
Can Qin
Le Xue
Manli Shu
Silvio Savarese
Ran Xu
Caiming Xiong
Juan Carlos Niebles
VGen
46
13
0
21 Oct 2024
Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation
Zixin Wang
Dong Gong
Sen Wang
Zi Huang
Yadan Luo
VLM
36
0
0
16 Oct 2024
Spatial-Aware Efficient Projector for MLLMs via Multi-Layer Feature Aggregation
Shun Qian
Bingquan Liu
Chengjie Sun
Zhen Xu
Baoxun Wang
36
0
0
14 Oct 2024
Retrieval Replace Reduction: An effective visual token reduction method via semantic match
Yingen Liu
Fan Wu
Ruihui Li
Zhuo Tang
KenLi Li
VLM
29
0
0
09 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
31
7
0
08 Oct 2024
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See
Phu Pham
Phu Pham
Kun Wan
Yu-Jhe Li
Zeliang Zhang
Daniel Miranda
Ajinkya Kale
Ajinkya Kale
Chenliang Xu
35
5
0
08 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
86
26
0
04 Oct 2024
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Yichen Lu
Jiaqi Song
Chao-Han Huck Yang
Shinji Watanabe
28
0
0
03 Oct 2024
AVG-LLaVA: A Large Multimodal Model with Adaptive Visual Granularity
Zhibin Lan
Liqiang Niu
Fandong Meng
Wenbo Li
Jie Zhou
Jinsong Su
VLM
38
2
0
20 Sep 2024
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner
Yuzhang Shang
Bingxin Xu
Weitai Kang
Mu Cai
Yuheng Li
Zehao Wen
Zhen Dong
Kurt Keutzer
Yong Jae Lee
Yan Yan
38
7
0
19 Sep 2024
VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation
Hanning Chen
Yang Ni
Wenjun Huang
Yezi Liu
SungHeon Jeong
Fei Wen
Nathaniel D. Bastian
Hugo Latapie
Mohsen Imani
VLM
32
4
0
13 Sep 2024
Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information
Yi Chen
Jian Xu
Xu-Yao Zhang
Wen-Zhuo Liu
Yang-Yang Liu
Cheng-Lin Liu
31
3
0
02 Sep 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
52
12
0
29 Aug 2024
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
Yuanyang Yin
Yaqi Zhao
Yajie Zhang
Ke Lin
Jiahao Wang
Xin Tao
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
LRM
39
6
0
21 Aug 2024
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments
Kazi Hasan Ibn Arif
JinYi Yoon
Dimitrios S. Nikolopoulos
Hans Vandierendonck
Deepu John
Bo Ji
MLLM
VLM
53
14
0
20 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLM
MLLM
66
19
0
04 Aug 2024
Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding
Renshan Zhang
Yibo Lyu
Rui Shao
Gongwei Chen
Weili Guan
Liqiang Nie
39
9
0
19 Jul 2024
ACTRESS: Active Retraining for Semi-supervised Visual Grounding
Weitai Kang
Mengxue Qu
Yunchao Wei
Yan Yan
41
6
0
03 Jul 2024
Visual Grounding with Attention-Driven Constraint Balancing
Weitai Kang
Luowei Zhou
Junyi Wu
Changchang Sun
Yan Yan
45
4
0
03 Jul 2024
SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
Weitai Kang
Gaowen Liu
Mubarak Shah
Yan Yan
ObjD
41
9
0
03 Jul 2024
TokenPacker: Efficient Visual Projector for Multimodal LLM
Wentong Li
Yuqian Yuan
Jian Liu
Dongqi Tang
Song Wang
Jie Qin
Jianke Zhu
Lei Zhang
MLLM
37
53
0
02 Jul 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
37
15
0
27 Jun 2024
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Zhongwei Wan
Ziang Wu
Che Liu
Jinfa Huang
Zhihong Zhu
Peng Jin
Longyue Wang
Li Yuan
VLM
41
29
0
26 Jun 2024
Long Context Transfer from Language to Vision
Peiyuan Zhang
Kaichen Zhang
Bo Li
Guangtao Zeng
Jingkang Yang
Yuanhan Zhang
Ziyue Wang
Haoran Tan
Chunyuan Li
Ziwei Liu
VLM
69
141
0
24 Jun 2024
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
Bingchen Zhao
Yongshuo Zong
Letian Zhang
Timothy Hospedales
VLM
33
15
0
18 Jun 2024
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
Tianren Ma
Lingxi Xie
Yunjie Tian
Boyu Yang
Yuan Zhang
44
0
0
17 Jun 2024
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Weitai Kang
Mengxue Qu
Jyoti Kini
Yunchao Wei
Mubarak Shah
Yan Yan
LM&Ro
3DPC
53
10
0
28 May 2024
Matryoshka Multimodal Models
Mu Cai
Jianwei Yang
Jianfeng Gao
Yong Jae Lee
VLM
53
25
0
27 May 2024
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
47
45
0
17 May 2024
Small Language Model Meets with Reinforced Vision Vocabulary
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
En Yu
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
VLM
57
40
0
23 Jan 2024
LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
Yichen Zhu
Minjie Zhu
Ning Liu
Zhicai Ou
Xiaofeng Mou
Jian Tang
74
92
0
04 Jan 2024
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
197
595
0
16 Nov 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
290
4,261
0
30 Jan 2023
Adaptive Sparse ViT: Towards Learnable Adaptive Token Pruning by Fully Exploiting Self-Attention
Xiangcheng Liu
Tianyi Wu
Guodong Guo
ViT
48
26
0
28 Sep 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
211
1,113
0
20 Sep 2022
Previous
1
2