Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.09461
Cited By
Token Merging: Your ViT But Faster
17 October 2022
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Token Merging: Your ViT But Faster"
50 / 321 papers shown
Title
Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge
Nick Eliopoulos
Purvish Jajal
James Davis
Gaowen Liu
George K. Thiravathukal
Yung-Hsiang Lu
43
1
0
01 Jul 2024
DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models
Chang-Han Yeh
Chin-Yang Lin
Zhixiang Wang
Chi-Wei Hsiao
Ting-Hsuan Chen
Hau-Shiang Shiu
Yu-Lun Liu
VGen
DiffM
57
5
0
01 Jul 2024
LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference
Zhongwei Wan
Ziang Wu
Che Liu
Jinfa Huang
Zhihong Zhu
Peng Jin
Longyue Wang
Li Yuan
VLM
38
28
0
26 Jun 2024
Diffusion Model-Based Video Editing: A Survey
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Dacheng Tao
VGen
66
22
0
26 Jun 2024
ScanFormer: Referring Expression Comprehension by Iteratively Scanning
Wei Su
Peihan Miao
Huanzhang Dou
Xi Li
ObjD
42
7
0
26 Jun 2024
D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Zhongwei Wan
Xinjian Wu
Yu Zhang
Yi Xin
Chaofan Tao
...
Xin Wang
Siqi Luo
Jing Xiong
Mi Zhang
Mi Zhang
29
0
0
18 Jun 2024
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi
Svetlana Orlova
Daan de Geus
Gijs Dubbelman
ViT
FedML
48
4
0
14 Jun 2024
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing
Jiangshan Wang
Yue Ma
Jiayi Guo
Yicheng Xiao
Gao Huang
Xiu Li
DiffM
28
17
0
13 Jun 2024
ToSA: Token Selective Attention for Efficient Vision Transformers
Manish Kumar Singh
R. Yasarla
Hong Cai
Mingu Lee
Fatih Porikli
62
0
0
13 Jun 2024
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
Hector A. Valdez
Kyle Min
Subarna Tripathi
VLM
44
1
0
13 Jun 2024
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
Yi-Fan Zhang
Qingsong Wen
Chaoyou Fu
Xue Wang
Zhang Zhang
L. Wang
Rong Jin
34
40
0
12 Jun 2024
FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation
Swarup Ranjan Behera
Abhishek Dhiman
Karthik Gowda
Aalekhya Satya Narayani
21
1
0
11 Jun 2024
AIM: Let Any Multi-modal Large Language Models Embrace Efficient In-Context Learning
Jun Gao
Qian Qiao
Ziqiang Cao
Zili Wang
Wenjie Li
28
3
0
11 Jun 2024
I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data
Hoang H. Le
D. M. Nguyen
Omair Shahzad Bhatti
Laszlo Kopacsi
Thinh P. Ngo
Binh T. Nguyen
Michael Barz
Daniel Sonntag
53
0
0
10 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
63
32
0
07 Jun 2024
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Le Zhuo
Ruoyi Du
Han Xiao
Yangguang Li
Dongyang Liu
...
Wanli Ouyang
Ziwei Liu
Yu Qiao
Hongsheng Li
Peng Gao
52
44
0
05 Jun 2024
MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
Yu Zhang
Qi Zhang
Zixuan Gong
Yiwei Shi
Yepeng Liu
...
Ke Liu
Kun Yi
Wei Fan
Liang Hu
Changwei Wang
CLIP
VLM
61
3
0
03 Jun 2024
Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
Jungmin Yun
Mihyeon Kim
Youngbin Kim
74
9
0
03 Jun 2024
A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies
Jinchao Zhu
Yuxuan Wang
Siyuan Pan
Pengfei Wan
Di Zhang
Gao Huang
26
0
0
31 May 2024
Automatic Channel Pruning for Multi-Head Attention
Eunho Lee
Youngbae Hwang
ViT
40
1
0
31 May 2024
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
Bin Ren
Yawei Li
Jingyun Liang
Rakesh Ranjan
Mengyuan Liu
Rita Cucchiara
Luc Van Gool
Ming-Hsuan Yang
N. Sebe
45
5
0
30 May 2024
Matryoshka Query Transformer for Large Vision-Language Models
Wenbo Hu
Zi-Yi Dou
Liunian Harold Li
Amita Kamath
Nanyun Peng
Kai-Wei Chang
MLLM
36
8
0
29 May 2024
Efficient Time Series Processing for Transformers and State-Space Models through Token Merging
Leon Götz
Marcel Kollovieh
Stephan Günnemann
Leo Schwinn
24
1
0
28 May 2024
Matryoshka Multimodal Models
Mu Cai
Jianwei Yang
Jianfeng Gao
Yong Jae Lee
VLM
45
25
0
27 May 2024
Unifying Demonstration Selection and Compression for In-Context Learning
Jun Gao
Ziqiang Cao
Wenjie Li
38
3
0
27 May 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
39
8
0
25 May 2024
Streaming Long Video Understanding with Large Language Models
Rui Qian
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Shuangrui Ding
Dahua Lin
Jiaqi Wang
VLM
39
40
0
25 May 2024
Looking Backward: Streaming Video-to-Video Translation with Feature Banks
Feng Liang
Akio Kodaira
Chenfeng Xu
M. Tomizuka
Kurt Keutzer
Diana Marculescu
DiffM
VGen
70
7
0
24 May 2024
Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation
Daniel Kienzle
Marco Kantonis
Robin Schon
Rainer Lienhart
35
2
0
23 May 2024
Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
51
0
0
22 May 2024
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
Wentao Jiang
Jing Zhang
Di Wang
Qiming Zhang
Zengmao Wang
Bo Du
37
5
0
16 May 2024
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Hongjie Wang
Difan Liu
Yan Kang
Yijun Li
Zhe Lin
N. Jha
Yuchen Liu
31
13
0
08 May 2024
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song
Wenhao Chai
Tianbo Ye
Jenq-Neng Hwang
Xi Li
Gaoang Wang
VLM
MLLM
37
30
0
26 Apr 2024
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning
Liang Zhang
Anwen Hu
Haiyang Xu
Mingshi Yan
Yichen Xu
Qin Jin
Ji Zhang
Fei Huang
51
15
0
25 Apr 2024
SNP: Structured Neuron-level Pruning to Preserve Attention Scores
Kyunghwan Shim
Jaewoong Yun
Shinkook Choi
25
0
0
18 Apr 2024
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
Woomin Song
Seunghyuk Oh
Sangwoo Mo
Jaehyung Kim
Sukmin Yun
Jung-Woo Ha
Jinwoo Shin
30
14
0
16 Apr 2024
Leveraging Temporal Contextualization for Video Action Recognition
Minji Kim
Dongyoon Han
Taekyung Kim
Bohyung Han
51
2
0
15 Apr 2024
Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
Haosong Peng
Wei Feng
Hao Li
Yufeng Zhan
Qihua Zhou
Yuanqing Xia
30
2
0
14 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
43
24
0
10 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
83
88
0
08 Apr 2024
MLP Can Be A Good Transformer Learner
Sihao Lin
Pumeng Lyu
Dongrui Liu
Tao Tang
Xiaodan Liang
Andy Song
Xiaojun Chang
ViT
37
11
0
08 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
68
56
0
04 Apr 2024
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
41
32
0
01 Apr 2024
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Yang Luo
Zhineng Chen
Peng Zhou
Zuxuan Wu
Xieping Gao
Yu-Gang Jiang
SSL
24
1
0
31 Mar 2024
A General and Efficient Training for Transformer via Token Expansion
Wenxuan Huang
Yunhang Shen
Jiao Xie
Baochang Zhang
Gaoqi He
Ke Li
Xing Sun
Shaohui Lin
42
3
0
31 Mar 2024
The Need for Speed: Pruning Transformers with One Recipe
Samir Khaki
Konstantinos N. Plataniotis
32
10
0
26 Mar 2024
Block Selective Reprogramming for On-device Training of Vision Transformers
Sreetama Sarkar
Souvik Kundu
Kai Zheng
P. Beerel
37
2
0
25 Mar 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
38
2
0
24 Mar 2024
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang
Mu Cai
Bingxin Xu
Yong Jae Lee
Yan Yan
VLM
31
107
0
22 Mar 2024
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi
Sanghyeok Lee
Jaewon Chu
Minhyuk Choi
Hyunwoo J. Kim
MoMe
ViT
55
12
0
20 Mar 2024
Previous
1
2
3
4
5
6
7
Next