ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09461
  4. Cited By
Token Merging: Your ViT But Faster

Token Merging: Your ViT But Faster

17 October 2022
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
    MoMe
ArXivPDFHTML

Papers citing "Token Merging: Your ViT But Faster"

50 / 321 papers shown
Title
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios
  via Prompt Compression
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
Huiqiang Jiang
Qianhui Wu
Xufang Luo
Dongsheng Li
Chin-Yew Lin
Yuqing Yang
Lili Qiu
RALM
118
183
0
10 Oct 2023
Latent Diffusion Counterfactual Explanations
Latent Diffusion Counterfactual Explanations
Karim Farid
Simon Schrodi
Max Argus
Thomas Brox
DiffM
40
12
0
10 Oct 2023
LLMLingua: Compressing Prompts for Accelerated Inference of Large
  Language Models
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
Huiqiang Jiang
Qianhui Wu
Chin-Yew Lin
Yuqing Yang
Lili Qiu
29
101
0
09 Oct 2023
Expedited Training of Visual Conditioned Language Generation via
  Redundancy Reduction
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
20
7
0
05 Oct 2023
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language
  Models
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models
Yi-Lin Sung
Jaehong Yoon
Mohit Bansal
VLM
17
14
0
04 Oct 2023
SlowFormer: Universal Adversarial Patch for Attack on Compute and Energy
  Efficiency of Inference Efficient Vision Transformers
SlowFormer: Universal Adversarial Patch for Attack on Compute and Energy Efficiency of Inference Efficient Vision Transformers
K. Navaneet
Soroush Abbasi Koohpayegani
Essam Sleiman
Hamed Pirsiavash
AAML
ViT
13
1
0
04 Oct 2023
PPT: Token Pruning and Pooling for Efficient Vision Transformers
PPT: Token Pruning and Pooling for Efficient Vision Transformers
Xinjian Wu
Fanhu Zeng
Xiudong Wang
Xinghao Chen
ViT
24
22
0
03 Oct 2023
SEA: Sparse Linear Attention with Estimated Attention Mask
SEA: Sparse Linear Attention with Estimated Attention Mask
Heejun Lee
Jina Kim
Jeffrey Willette
Sung Ju Hwang
30
6
0
03 Oct 2023
BTR: Binary Token Representations for Efficient Retrieval Augmented
  Language Models
BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models
Qingqing Cao
Sewon Min
Yizhong Wang
Hannaneh Hajishirzi
MQ
RALM
33
4
0
02 Oct 2023
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Vincent Leroy
Jérôme Revaud
Thomas Lucas
Philippe Weinzaepfel
ViT
36
2
0
01 Oct 2023
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens
Yangyang Guo
Haoyu Zhang
Yongkang Wong
Liqiang Nie
Mohan S. Kankanhalli
VLM
22
3
0
28 Sep 2023
CAIT: Triple-Win Compression towards High Accuracy, Fast Inference, and
  Favorable Transferability For ViTs
CAIT: Triple-Win Compression towards High Accuracy, Fast Inference, and Favorable Transferability For ViTs
Ao Wang
Hui Chen
Zijia Lin
Sicheng Zhao
J. Han
Guiguang Ding
ViT
31
6
0
27 Sep 2023
LLMCad: Fast and Scalable On-device Large Language Model Inference
LLMCad: Fast and Scalable On-device Large Language Model Inference
Daliang Xu
Wangsong Yin
Xin Jin
Yuhang Zhang
Shiyun Wei
Mengwei Xu
Xuanzhe Liu
17
43
0
08 Sep 2023
A survey on efficient vision transformers: algorithms, techniques, and
  performance benchmarking
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking
Lorenzo Papa
Paolo Russo
Irene Amerini
Luping Zhou
30
42
0
05 Sep 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
27
24
0
04 Sep 2023
Eventful Transformers: Leveraging Temporal Redundancy in Vision
  Transformers
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
Matthew Dutson
Yin Li
M. Gupta
ViT
43
8
0
25 Aug 2023
ConcatPlexer: Additional Dim1 Batching for Faster ViTs
ConcatPlexer: Additional Dim1 Batching for Faster ViTs
D. Han
Seunghyeon Seo
D. Jeon
Jiho Jang
Chaerin Kong
Nojun Kwak
ViT
MoE
28
0
0
22 Aug 2023
DiffLLE: Diffusion-guided Domain Calibration for Unsupervised Low-light
  Image Enhancement
DiffLLE: Diffusion-guided Domain Calibration for Unsupervised Low-light Image Enhancement
Shuzhou Yang
Xuanyu Zhang
Yinhuai Wang
Jiwen Yu
Yuhan Wang
Jian Zhang
DiffM
31
6
0
18 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
42
3
0
18 Aug 2023
Which Tokens to Use? Investigating Token Reduction in Vision
  Transformers
Which Tokens to Use? Investigating Token Reduction in Vision Transformers
Joakim Bruslund Haurum
Sergio Escalera
Graham W. Taylor
T. Moeslund
ViT
38
33
0
09 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
29
16
0
08 Aug 2023
MovieChat: From Dense Token to Sparse Memory for Long Video
  Understanding
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song
Wenhao Chai
Guanhong Wang
Yucheng Zhang
Haoyang Zhou
...
Tianbo Ye
Yanting Zhang
Yang Lu
Jenq-Neng Hwang
Gaoang Wang
VLM
MLLM
22
262
0
31 Jul 2023
Learned Thresholds Token Merging and Pruning for Vision Transformers
Learned Thresholds Token Merging and Pruning for Vision Transformers
Maxim Bonnaerens
J. Dambre
22
22
0
20 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and
  Resolution
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
23
105
0
12 Jul 2023
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
Jakob Drachmann Havtorn
Amelie Royer
Tijmen Blankevoort
B. Bejnordi
30
8
0
05 Jul 2023
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Yuqin Zhu
Yichen Zhu
ViT
69
17
0
05 Jul 2023
Accelerating Transducers through Adjacent Token Merging
Accelerating Transducers through Adjacent Token Merging
Yuang Li
Yu-Huan Wu
Jinyu Li
Shujie Liu
22
4
0
28 Jun 2023
Adaptive Window Pruning for Efficient Local Motion Deblurring
Adaptive Window Pruning for Efficient Local Motion Deblurring
Haoying Li
Jixin Zhao
Shangchen Zhou
H. Feng
Chongyi Li
Chen Change Loy
ViT
23
4
0
25 Jun 2023
How can objects help action recognition?
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
35
14
0
20 Jun 2023
Scaling Open-Vocabulary Object Detection
Scaling Open-Vocabulary Object Detection
Matthias Minderer
A. Gritsenko
N. Houlsby
VLM
ObjD
24
178
0
16 Jun 2023
Dissecting Multimodality in VideoQA Transformer Models by Impairing
  Modality Fusion
Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion
Isha Rawal
Alexander Matyasko
Shantanu Jaiswal
Basura Fernando
Cheston Tan
21
1
0
15 Jun 2023
Revisiting Token Pruning for Object Detection and Instance Segmentation
Revisiting Token Pruning for Object Detection and Instance Segmentation
Yifei Liu
Mathias Gehrig
Nico Messikommer
Marco Cannici
Davide Scaramuzza
ViT
VLM
37
24
0
12 Jun 2023
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient
  Vision Transformer
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Haoran You
Huihong Shi
Yipin Guo
Yingyan Lin
Lin
28
16
0
10 Jun 2023
Exploring Effective Mask Sampling Modeling for Neural Image Compression
Exploring Effective Mask Sampling Modeling for Neural Image Compression
Lin Liu
Mingming Zhao
Shanxin Yuan
Wenlong Lyu
Wen-gang Zhou
Houqiang Li
Yanfeng Wang
Qi Tian
16
3
0
09 Jun 2023
Multi-Scale And Token Mergence: Make Your ViT More Efficient
Multi-Scale And Token Mergence: Make Your ViT More Efficient
Zhe Bian
Zhe Wang
Wenqiang Han
Kangping Wang
17
5
0
08 Jun 2023
Content-aware Token Sharing for Efficient Semantic Segmentation with
  Vision Transformers
Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers
Chenyang Lu
Daan de Geus
Gijs Dubbelman
ViT
25
20
0
03 Jun 2023
Adapting Pre-trained Language Models to Vision-Language Tasks via
  Dynamic Visual Prompting
Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting
Shubin Huang
Qiong Wu
Yiyi Zhou
Weijie Chen
Rongsheng Zhang
Xiaoshuai Sun
Rongrong Ji
VLM
VPVLM
LRM
16
0
0
01 Jun 2023
Vision Transformers for Mobile Applications: A Short Survey
Vision Transformers for Mobile Applications: A Short Survey
Nahid Alam
Steven Kolawole
S. Sethi
Nishant Bansali
Karina Nguyen
ViT
31
3
0
30 May 2023
DiffRate : Differentiable Compression Rate for Efficient Vision
  Transformers
DiffRate : Differentiable Compression Rate for Efficient Vision Transformers
Mengzhao Chen
Wenqi Shao
Peng Xu
Mingbao Lin
Kaipeng Zhang
Rongrong Ji
Rongrong Ji
Yu Qiao
Ping Luo
ViT
44
43
0
29 May 2023
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
Qingqing Cao
Bhargavi Paranjape
Hannaneh Hajishirzi
MLLM
VLM
13
21
0
27 May 2023
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating
  Vision-Language Transformers
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers
Dachuan Shi
Chaofan Tao
Anyi Rao
Zhendong Yang
Chun Yuan
Jiaqi Wang
VLM
30
22
0
27 May 2023
Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion
  Inference
Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion Inference
Zihao Yu
Haoyang Li
Fangcheng Fu
Xupeng Miao
Bin Cui
DiffM
24
8
0
27 May 2023
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention
  Graph in Pre-Trained Transformers
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang
Bhishma Dedhia
N. Jha
ViT
VLM
41
26
0
27 May 2023
COMCAT: Towards Efficient Compression and Customization of
  Attention-Based Vision Models
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
Jinqi Xiao
Miao Yin
Yu Gong
Xiao Zang
Jian Ren
Bo Yuan
VLM
ViT
40
9
0
26 May 2023
Do We Really Need a Large Number of Visual Prompts?
Do We Really Need a Large Number of Visual Prompts?
Youngeun Kim
Yuhang Li
Abhishek Moitra
Ruokai Yin
Priyadarshini Panda
VLM
VPVLM
40
5
0
26 May 2023
Dynamic Context Pruning for Efficient and Interpretable Autoregressive
  Transformers
Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers
Sotiris Anagnostidis
Dario Pavllo
Luca Biggio
Lorenzo Noci
Aurelien Lucchi
Thomas Hofmann
36
53
0
25 May 2023
BinaryViT: Towards Efficient and Accurate Binary Vision Transformers
Junrui Xiao
Zhikai Li
Lianwei Yang
Qingyi Gu
MQ
ViT
30
2
0
24 May 2023
LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis
LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis
Chang-Shu Liu
Rui Li
Kaidong Zhang
Xin Luo
Dong Liu
DiffM
29
3
0
19 May 2023
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized
  Attention
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention
Guangxuan Xiao
Tianwei Yin
William T. Freeman
F. Durand
Song Han
VGen
DiffM
45
238
0
17 May 2023
ZipIt! Merging Models from Different Tasks without Training
ZipIt! Merging Models from Different Tasks without Training
George Stoica
Daniel Bolya
J. Bjorner
Pratik Ramesh
Taylor N. Hearn
Judy Hoffman
VLM
MoMe
46
111
0
04 May 2023
Previous
1234567
Next