ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.17247
  4. Cited By
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

22 October 2024
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
Yuhang Zang
Yuhang Cao
Zeang Sheng
Jiaqi Wang
Feng Wu
Dahua Lin
    VLM
ArXivPDFHTML

Papers citing "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction"

50 / 85 papers shown
Title
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
VScan: Rethinking Visual Token Reduction for Efficient Large Vision-Language Models
Ce Zhang
Kaixin Ma
Tianqing Fang
Wenhao Yu
Hongming Zhang
Zhisong Zhang
Yaqi Xie
Katia Sycara
Haitao Mi
Dong Yu
VLM
52
0
0
28 May 2025
HoliTom: Holistic Token Merging for Fast Video Large Language Models
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao
Keda Tao
Can Qin
Haoxuan You
Yang Sui
Huan Wang
VLM
37
0
0
27 May 2025
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models
AdaTP: Attention-Debiased Token Pruning for Video Large Language Models
Fengyuan Sun
Leqi Shen
Hui Chen
Sicheng Zhao
Jungong Han
Guiguang Ding
VLM
22
0
0
26 May 2025
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
Benjamin Schneider
Dongfu Jiang
Chao Du
Tianyu Pang
Wenhu Chen
VLM
27
0
0
22 May 2025
Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
Penghao Wu
Lewei Lu
Ziwei Liu
89
0
0
21 May 2025
AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning
AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning
Kai Zhang
Xingyu Chen
Xiaofeng Zhang
62
0
0
19 May 2025
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
Yichen Guo
Hanze Li
Zonghao Zhang
Jinhao You
Kai Tang
Xiande Huang
VLM
61
0
0
18 May 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Ziqiang Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLM
VLM
227
1
0
28 Apr 2025
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Zehao Wang
Senthil Purushwalkam
Caiming Xiong
Siyang Song
Chenhui Xu
Ran Xu
129
2
0
23 Apr 2025
MR. Video: "MapReduce" is the Principle for Long Video Understanding
MR. Video: "MapReduce" is the Principle for Long Video Understanding
Ziqi Pang
Yu-Xiong Wang
VLM
87
1
0
22 Apr 2025
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation
Hanning Chen
Yang Ni
Wenjun Huang
Hyunwoo Oh
Yezi Liu
Tamoghno Das
Mohsen Imani
VLM
LRM
74
0
0
15 Apr 2025
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices
On-device Sora: Enabling Training-Free Diffusion-based Text-to-Video Generation for Mobile Devices
Bosung Kim
Kyuhwan Lee
Isu Jeong
Jungmin Cheon
Yeojin Lee
Seulki Lee
VGen
81
0
0
31 Mar 2025
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Hao Ai
Kunyi Wang
Zezhou Wang
H. Lu
Jin Tian
Yaxin Luo
Peng-Fei Xing
Jen-Yuan Huang
Huaxia Li
Gen Luo
MLLM
VLM
145
0
0
26 Mar 2025
Beyond Intermediate States: Explaining Visual Redundancy through Language
Beyond Intermediate States: Explaining Visual Redundancy through Language
Dingchen Yang
Bowen Cao
Anran Zhang
Weibo Gu
Winston Hu
Guang Chen
VLM
101
0
0
26 Mar 2025
Scaling Vision Pre-Training to 4K Resolution
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi
Boyi Li
Han Cai
Yaojie Lu
Sifei Liu
...
Jan Kautz
Enze Xie
Trevor Darrell
Pavlo Molchanov
Hongxu Yin
CLIP
344
0
0
25 Mar 2025
Growing a Twig to Accelerate Large Vision-Language Models
Growing a Twig to Accelerate Large Vision-Language Models
Zhenwei Shao
Mingyang Wang
Zhou Yu
Wenwen Pan
Yan Yang
Tao Wei
Hao Zhang
Ning Mao
Wei Chen
Jun Yu
VLM
81
2
0
18 Mar 2025
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
109
7
0
16 Mar 2025
Similarity-Aware Token Pruning: Your VLM but Faster
Ahmadreza Jeddi
Negin Baghbanzadeh
Elham Dolatabadi
Babak Taati
3DV
VLM
87
2
0
14 Mar 2025
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
Leqi Shen
Guoqiang Gong
Tao He
Yifeng Zhang
Pengzhang Liu
Sicheng Zhao
Guiguang Ding
VLM
86
2
0
14 Mar 2025
TokenCarve: Information-Preserving Visual Token Compression in Multimodal Large Language Models
Xudong Tan
Peng Ye
Chongjun Tu
Jianjian Cao
Yaoxin Yang
Lin Zhang
Dongzhan Zhou
Tao Chen
VLM
123
2
0
13 Mar 2025
VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers
VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers
Ruanjun Li
Yuedong Tan
Yuanming Shi
Jiawei Shao
VLM
308
0
0
12 Mar 2025
Multi-Cue Adaptive Visual Token Pruning for Large Vision-Language Models
Bozhi Luan
Wengang Zhou
Hao Feng
Zhe Wang
Xiaosong Li
Haoyang Li
VLM
97
0
0
11 Mar 2025
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo
Yingying Zhang
Xiaoyu Yang
Kang Wu
Qi Zhu
Lei Liang
Jingdong Chen
Yansheng Li
116
1
0
10 Mar 2025
RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs
RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs
Hongliang Li
Jiaxin Zhang
Wenhui Liao
Dezhi Peng
Kai Ding
Lianwen Jin
OffRL
MQ
118
0
0
31 Jan 2025
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
Xiao Wang
Qingyi Si
Jianlong Wu
Shiyu Zhu
Zheng Lin
Liqiang Nie
VLM
117
7
0
29 Dec 2024
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and
  Pruning
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong
Zhuoming Liu
Yin Li
Liwei Wang
107
7
0
04 Dec 2024
Deciphering Cross-Modal Alignment in Large Vision-Language Models with
  Modality Integration Rate
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
Qidong Huang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Jiaqi Wang
Dahua Lin
Weiming Zhang
Nenghai Yu
61
9
0
09 Oct 2024
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Javier Marin
LRM
111
0
0
06 Oct 2024
HiRED: Attention-Guided Token Dropping for Efficient Inference of
  High-Resolution Vision-Language Models in Resource-Constrained Environments
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments
Kazi Hasan Ibn Arif
JinYi Yoon
Dimitrios S. Nikolopoulos
Hans Vandierendonck
Deepu John
Bo Ji
MLLM
VLM
71
16
0
20 Aug 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal
  Large Language Models
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
Jiabo Ye
Haiyang Xu
Haowei Liu
Anwen Hu
Ming Yan
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
67
126
0
09 Aug 2024
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Yuan Yao
Tianyu Yu
Ao Zhang
Chongyi Wang
Junbo Cui
...
Xu Han
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
VLM
MLLM
101
439
0
03 Aug 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
122
161
0
16 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
  Supporting Long-Contextual Input and Output
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
85
111
0
03 Jul 2024
MMLongBench-Doc: Benchmarking Long-context Document Understanding with
  Visualizations
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Yubo Ma
Yuhang Zang
Liangyu Chen
Meiqi Chen
Yizhu Jiao
...
Liangming Pan
Yu-Gang Jiang
Jiaqi Wang
Yixin Cao
Aixin Sun
ELM
RALM
VLM
71
31
0
01 Jul 2024
LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context
  Compression
LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression
Jieneng Chen
Luoxin Ye
Ju He
Zhao-Yang Wang
Daniel Khashabi
Alan Yuille
VLM
44
5
0
28 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DV
MLLM
95
344
0
24 Jun 2024
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Yuxuan Qiao
Haodong Duan
Xinyu Fang
Junming Yang
Lin Chen
Songyang Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LRM
73
23
0
20 Jun 2024
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and
  Instruction-Tuning Dataset for LVLMs
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Ziyu Liu
Tao Chu
Yuhang Zang
Xilin Wei
Xiaoyi Dong
...
Zijian Liang
Yuanjun Xiong
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
65
40
0
17 Jun 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better
  Captions
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Lin Chen
Xilin Wei
Jinsong Li
Xiaoyi Dong
Pan Zhang
...
Li Yuan
Yu Qiao
Dahua Lin
Feng Zhao
Jiaqi Wang
107
167
0
06 Jun 2024
DeCo: Decoupling Token Compression from Semantic Abstraction in
  Multimodal Large Language Models
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Linli Yao
Lei Li
Shuhuai Ren
Lean Wang
Yuanxin Liu
Xu Sun
Lu Hou
55
33
0
31 May 2024
Are We on the Right Way for Evaluating Large Vision-Language Models?
Are We on the Right Way for Evaluating Large Vision-Language Models?
Lin Chen
Jinsong Li
Xiao-wen Dong
Pan Zhang
Yuhang Zang
...
Haodong Duan
Jiaqi Wang
Yu Qiao
Dahua Lin
Feng Zhao
VLM
100
274
0
29 Mar 2024
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal
  Models
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang
Mu Cai
Bingxin Xu
Yong Jae Lee
Yan Yan
VLM
98
121
0
22 Mar 2024
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Beichen Zhang
Pan Zhang
Xiao-wen Dong
Yuhang Zang
Jiaqi Wang
CLIP
VLM
72
132
0
22 Mar 2024
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Ziyu Liu
Zeyi Sun
Yuhang Zang
Wei Li
Pan Zhang
Xiao-wen Dong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
24
13
0
20 Mar 2024
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document
  Understanding
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Anwen Hu
Haiyang Xu
Jiabo Ye
Mingshi Yan
Liang Zhang
...
Chen Li
Ji Zhang
Qin Jin
Fei Huang
Jingren Zhou
VLM
82
117
0
19 Mar 2024
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Ruyi Xu
Yuan Yao
Zonghao Guo
Junbo Cui
Zanlin Ni
Chunjiang Ge
Tat-Seng Chua
Zhiyuan Liu
Maosong Sun
Gao Huang
VLM
MLLM
82
112
0
18 Mar 2024
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient
  Task Adaptation
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Yizhe Xiong
Hui Chen
Tianxiang Hao
Zijia Lin
Jungong Han
Yuesong Zhang
Guoxin Wang
Yongjun Bao
Guiguang Ding
76
17
0
14 Mar 2024
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference
  Acceleration for Large Vision-Language Models
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen
Haozhe Zhao
Tianyu Liu
Shuai Bai
Junyang Lin
Chang Zhou
Baobao Chang
MLLM
VLM
91
139
0
11 Mar 2024
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition
SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition
Shuangrui Ding
Zihan Liu
Xiao-wen Dong
Pan Zhang
Rui Qian
Junhao Huang
Conghui He
Jiaqi Wang
Jiaqi Wang
80
0
0
27 Feb 2024
World Model on Million-Length Video And Language With Blockwise RingAttention
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
86
77
0
13 Feb 2024
12
Next