ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.17530
  4. Cited By
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models

PuMer: Pruning and Merging Tokens for Efficient Vision Language Models

27 May 2023
Qingqing Cao
Bhargavi Paranjape
Hannaneh Hajishirzi
    MLLM
    VLM
ArXivPDFHTML

Papers citing "PuMer: Pruning and Merging Tokens for Efficient Vision Language Models"

19 / 19 papers shown
Title
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
STAR: Stage-Wise Attention-Guided Token Reduction for Efficient Large Vision-Language Models Inference
Yichen Guo
Hanze Li
Zonghao Zhang
Jinhao You
Kai Tang
Xiande Huang
VLM
18
0
0
18 May 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
Lefei Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
54
0
0
01 Apr 2025
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Junzhu Mao
Yang Shen
Jinyang Guo
Yazhou Yao
Xiansheng Hua
ViT
43
0
0
30 Mar 2025
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Hao Ai
Kunyi Wang
Zezhou Wang
H. Lu
Jin Tian
Yaxin Luo
Peng-Fei Xing
Jen-Yuan Huang
Huaxia Li
Gen Luo
MLLM
VLM
110
0
0
26 Mar 2025
A Survey on Mamba Architecture for Vision Applications
A Survey on Mamba Architecture for Vision Applications
Fady Ibrahim
Guangjun Liu
Guanghui Wang
Mamba
64
3
0
11 Feb 2025
Deploying Foundation Model Powered Agent Services: A Survey
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
119
1
0
18 Dec 2024
Efficient Online Inference of Vision Transformers by Training-Free
  Tokenization
Efficient Online Inference of Vision Transformers by Training-Free Tokenization
Leonidas Gee
Wing Yan Li
V. Sharmanska
Novi Quadrianto
ViT
93
0
0
23 Nov 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Zeang Sheng
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
53
28
0
22 Oct 2024
Efficient Vision-Language Models by Summarizing Visual Tokens into
  Compact Registers
Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers
Yuxin Wen
Qingqing Cao
Qichen Fu
Sachin Mehta
Mahyar Najibi
VLM
25
4
0
17 Oct 2024
Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion
Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion
Hui Shen
Zhongwei Wan
Xin Wang
Mi Zhang
Mamba
32
6
0
15 Sep 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLM
MLLM
68
19
0
04 Aug 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
45
43
0
09 Jul 2024
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic
  Segmentation with Plain Vision Transformers
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi
Svetlana Orlova
Daan de Geus
Gijs Dubbelman
ViT
FedML
50
4
0
14 Jun 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
46
8
0
25 May 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for
  Faster Inference
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
38
2
0
24 Mar 2024
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced
  Training
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Raviteja Vemulapalli
Oncel Tuzel
CLIP
VLM
36
43
0
28 Nov 2023
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient
  Vision-Language Models
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
Zekun Wang
Jingchang Chen
Wangchunshu Zhou
Haichao Zhu
Jiafeng Liang
Liping Shan
Ming Liu
Dongliang Xu
Qing Yang
Bing Qin
VLM
26
4
0
24 May 2023
GroupViT: Semantic Segmentation Emerges from Text Supervision
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViT
VLM
198
504
0
22 Feb 2022
Token Pooling in Vision Transformers
Token Pooling in Vision Transformers
D. Marin
Jen-Hao Rick Chang
Anurag Ranjan
Anish K. Prabhu
Mohammad Rastegari
Oncel Tuzel
ViT
76
66
0
08 Oct 2021
1