ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2210.09461
  4. Cited By
Token Merging: Your ViT But Faster

Token Merging: Your ViT But Faster

17 October 2022
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
    MoMe
ArXivPDFHTML

Papers citing "Token Merging: Your ViT But Faster"

50 / 321 papers shown
Title
Pruning One More Token is Enough: Leveraging Latency-Workload
  Non-Linearities for Vision Transformers on the Edge
Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge
Nick Eliopoulos
Purvish Jajal
James Davis
Gaowen Liu
George K. Thiravathukal
Yung-Hsiang Lu
43
1
0
01 Jul 2024
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models
Chang-Han Yeh
Chin-Yang Lin
Zhixiang Wang
Chi-Wei Hsiao
Ting-Hsuan Chen
Hau-Shiang Shiu
Yu-Lun Liu
VGen
DiffM
57
5
0
01 Jul 2024
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal
  Long-Context Inference
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Zhongwei Wan
Ziang Wu
Che Liu
Jinfa Huang
Zhihong Zhu
Peng Jin
Longyue Wang
Li Yuan
VLM
38
28
0
26 Jun 2024
Diffusion Model-Based Video Editing: A Survey
Diffusion Model-Based Video Editing: A Survey
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Dacheng Tao
VGen
66
22
0
26 Jun 2024
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su
Peihan Miao
Huanzhang Dou
Xi Li
ObjD
42
7
0
26 Jun 2024
D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Zhongwei Wan
Xinjian Wu
Yu Zhang
Yi Xin
Chaofan Tao
...
Xin Wang
Siqi Luo
Jing Xiong
Mi Zhang
Mi Zhang
29
0
0
18 Jun 2024
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic
  Segmentation with Plain Vision Transformers
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi
Svetlana Orlova
Daan de Geus
Gijs Dubbelman
ViT
FedML
48
4
0
14 Jun 2024
COVE: Unleashing the Diffusion Feature Correspondence for Consistent
  Video Editing
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing
Jiangshan Wang
Yue Ma
Jiayi Guo
Yicheng Xiao
Gao Huang
Xiu Li
DiffM
28
17
0
13 Jun 2024
ToSA: Token Selective Attention for Efficient Vision Transformers
ToSA: Token Selective Attention for Efficient Vision Transformers
Manish Kumar Singh
R. Yasarla
Hong Cai
Mingu Lee
Fatih Porikli
62
0
0
13 Jun 2024
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
Hector A. Valdez
Kyle Min
Subarna Tripathi
VLM
44
1
0
13 Jun 2024
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Yi-Fan Zhang
Qingsong Wen
Chaoyou Fu
Xue Wang
Zhang Zhang
L. Wang
Rong Jin
34
40
0
12 Jun 2024
FastAST: Accelerating Audio Spectrogram Transformer via Token Merging
  and Cross-Model Knowledge Distillation
FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation
Swarup Ranjan Behera
Abhishek Dhiman
Karthik Gowda
Aalekhya Satya Narayani
21
1
0
11 Jun 2024
AIM: Let Any Multi-modal Large Language Models Embrace Efficient
  In-Context Learning
AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning
Jun Gao
Qian Qiao
Ziqiang Cao
Zili Wang
Wenjie Li
28
3
0
11 Jun 2024
I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop
  Annotation of Mobile Eye Tracking Data
I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data
Hoang H. Le
D. M. Nguyen
Omair Shahzad Bhatti
Laszlo Kopacsi
Thinh P. Ngo
Binh T. Nguyen
Michael Barz
Daniel Sonntag
53
0
0
10 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
63
32
0
07 Jun 2024
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Le Zhuo
Ruoyi Du
Han Xiao
Yangguang Li
Dongyang Liu
...
Wanli Ouyang
Ziwei Liu
Yu Qiao
Hongsheng Li
Peng Gao
52
44
0
05 Jun 2024
MLIP: Efficient Multi-Perspective Language-Image Pretraining with
  Exhaustive Data Utilization
MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
Yu Zhang
Qi Zhang
Zixuan Gong
Yiwei Shi
Yepeng Liu
...
Ke Liu
Kun Yi
Wei Fan
Liang Hu
Changwei Wang
CLIP
VLM
61
3
0
03 Jun 2024
Focus on the Core: Efficient Attention via Pruned Token Compression for
  Document Classification
Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
Jungmin Yun
Mihyeon Kim
Youngbin Kim
74
9
0
03 Jun 2024
A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature
  Inheritance Strategies
A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies
Jinchao Zhu
Yuxuan Wang
Siyuan Pan
Pengfei Wan
Di Zhang
Gao Huang
26
0
0
31 May 2024
Automatic Channel Pruning for Multi-Head Attention
Automatic Channel Pruning for Multi-Head Attention
Eunho Lee
Youngbae Hwang
ViT
40
1
0
31 May 2024
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
Bin Ren
Yawei Li
Jingyun Liang
Rakesh Ranjan
Mengyuan Liu
Rita Cucchiara
Luc Van Gool
Ming-Hsuan Yang
N. Sebe
45
5
0
30 May 2024
Matryoshka Query Transformer for Large Vision-Language Models
Matryoshka Query Transformer for Large Vision-Language Models
Wenbo Hu
Zi-Yi Dou
Liunian Harold Li
Amita Kamath
Nanyun Peng
Kai-Wei Chang
MLLM
36
8
0
29 May 2024
Efficient Time Series Processing for Transformers and State-Space Models
  through Token Merging
Efficient Time Series Processing for Transformers and State-Space Models through Token Merging
Leon Götz
Marcel Kollovieh
Stephan Günnemann
Leo Schwinn
24
1
0
28 May 2024
Matryoshka Multimodal Models
Matryoshka Multimodal Models
Mu Cai
Jianwei Yang
Jianfeng Gao
Yong Jae Lee
VLM
45
25
0
27 May 2024
Unifying Demonstration Selection and Compression for In-Context Learning
Unifying Demonstration Selection and Compression for In-Context Learning
Jun Gao
Ziqiang Cao
Wenjie Li
38
3
0
27 May 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
39
8
0
25 May 2024
Streaming Long Video Understanding with Large Language Models
Streaming Long Video Understanding with Large Language Models
Rui Qian
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Shuangrui Ding
Dahua Lin
Jiaqi Wang
VLM
39
40
0
25 May 2024
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Feng Liang
Akio Kodaira
Chenfeng Xu
M. Tomizuka
Kurt Keutzer
Diana Marculescu
DiffM
VGen
70
7
0
24 May 2024
Segformer++: Efficient Token-Merging Strategies for High-Resolution
  Semantic Segmentation
Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation
Daniel Kienzle
Marco Kantonis
Robin Schon
Rainer Lienhart
35
2
0
23 May 2024
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for
  Vision Transformer
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
51
0
0
22 May 2024
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for
  Remote Sensing Image Interpretation
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
Wentao Jiang
Jing Zhang
Di Wang
Qiming Zhang
Zengmao Wang
Bo Du
37
5
0
16 May 2024
Attention-Driven Training-Free Efficiency Enhancement of Diffusion
  Models
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Hongjie Wang
Difan Liu
Yan Kang
Yijun Li
Zhe Lin
N. Jha
Yuchen Liu
31
13
0
08 May 2024
MovieChat+: Question-aware Sparse Memory for Long Video Question
  Answering
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song
Wenhao Chai
Tianbo Ye
Jenq-Neng Hwang
Xi Li
Gaoang Wang
VLM
MLLM
37
30
0
26 Apr 2024
TinyChart: Efficient Chart Understanding with Visual Token Merging and
  Program-of-Thoughts Learning
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
Liang Zhang
Anwen Hu
Haiyang Xu
Mingshi Yan
Yichen Xu
Qin Jin
Ji Zhang
Fei Huang
51
15
0
25 Apr 2024
SNP: Structured Neuron-level Pruning to Preserve Attention Scores
SNP: Structured Neuron-level Pruning to Preserve Attention Scores
Kyunghwan Shim
Jaewoong Yun
Shinkook Choi
25
0
0
18 Apr 2024
Hierarchical Context Merging: Better Long Context Understanding for
  Pre-trained LLMs
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
Woomin Song
Seunghyuk Oh
Sangwoo Mo
Jaehyung Kim
Sukmin Yun
Jung-Woo Ha
Jinwoo Shin
30
14
0
16 Apr 2024
Leveraging Temporal Contextualization for Video Action Recognition
Leveraging Temporal Contextualization for Video Action Recognition
Minji Kim
Dongyoon Han
Taekyung Kim
Bohyung Han
51
2
0
15 Apr 2024
Arena: A Patch-of-Interest ViT Inference Acceleration System for
  Edge-Assisted Video Analytics
Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
Haosong Peng
Wei Feng
Hao Li
Yufeng Zhan
Qihua Zhou
Yuanqing Xia
30
2
0
14 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
43
24
0
10 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
  Understanding
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
83
88
0
08 Apr 2024
MLP Can Be A Good Transformer Learner
MLP Can Be A Good Transformer Learner
Sihao Lin
Pumeng Lyu
Dongrui Liu
Tao Tang
Xiaodan Liang
Andy Song
Xiaojun Chang
ViT
37
11
0
08 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
68
56
0
04 Apr 2024
Streaming Dense Video Captioning
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
41
32
0
01 Apr 2024
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Yang Luo
Zhineng Chen
Peng Zhou
Zuxuan Wu
Xieping Gao
Yu-Gang Jiang
SSL
24
1
0
31 Mar 2024
A General and Efficient Training for Transformer via Token Expansion
A General and Efficient Training for Transformer via Token Expansion
Wenxuan Huang
Yunhang Shen
Jiao Xie
Baochang Zhang
Gaoqi He
Ke Li
Xing Sun
Shaohui Lin
42
3
0
31 Mar 2024
The Need for Speed: Pruning Transformers with One Recipe
The Need for Speed: Pruning Transformers with One Recipe
Samir Khaki
Konstantinos N. Plataniotis
32
10
0
26 Mar 2024
Block Selective Reprogramming for On-device Training of Vision
  Transformers
Block Selective Reprogramming for On-device Training of Vision Transformers
Sreetama Sarkar
Souvik Kundu
Kai Zheng
P. Beerel
37
2
0
25 Mar 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for
  Faster Inference
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
38
2
0
24 Mar 2024
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal
  Models
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang
Mu Cai
Bingxin Xu
Yong Jae Lee
Yan Yan
VLM
31
107
0
22 Mar 2024
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi
Sanghyeok Lee
Jaewon Chu
Minhyuk Choi
Hyunwoo J. Kim
MoMe
ViT
55
12
0
20 Mar 2024
Previous
1234567
Next