Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.09461
Cited By
Token Merging: Your ViT But Faster
17 October 2022
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Token Merging: Your ViT But Faster"
50 / 321 papers shown
Title
Rethinking Token Reduction for State Space Models
Zheng Zhan
Yushu Wu
Zhenglun Kong
Changdi Yang
Yifan Gong
Xuan Shen
Xue Lin
Pu Zhao
Yanzhi Wang
Mamba
32
4
0
16 Oct 2024
Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation
Zixin Wang
Dong Gong
Sen Wang
Zi Huang
Yadan Luo
VLM
34
0
0
16 Oct 2024
Selection-p: Self-Supervised Task-Agnostic Prompt Compression for Faithfulness and Transferability
Tsz Ting Chung
Leyang Cui
Lemao Liu
Xinting Huang
Shuming Shi
Dit-Yan Yeung
38
1
0
15 Oct 2024
big.LITTLE Vision Transformer for Efficient Visual Recognition
He Guo
Yulong Wang
Zixuan Ye
Jifeng Dai
Yuwen Xiong
ViT
52
0
0
14 Oct 2024
I-Max: Maximize the Resolution Potential of Pre-trained Rectified Flow Transformers with Projected Flow
Ruoyi Du
Dongyang Liu
Le Zhuo
Qin Qi
Hongsheng Li
Zhanyu Ma
Peng Gao
29
1
0
10 Oct 2024
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Javier Marin
LRM
85
0
0
06 Oct 2024
SyllableLM: Learning Coarse Semantic Units for Speech Language Models
Alan Baade
Puyuan Peng
David Harwath
50
3
0
05 Oct 2024
Dynamic Diffusion Transformer
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Kai Wang
Yibing Song
Gao Huang
Fan Wang
Yang You
77
12
0
04 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
84
25
0
04 Oct 2024
FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model
Yichen Lu
Jiaqi Song
Chao-Han Huck Yang
Shinji Watanabe
21
0
0
03 Oct 2024
Exploring Token Pruning in Vision State Space Models
Zheng Zhan
Zhenglun Kong
Yifan Gong
Yushu Wu
Zichong Meng
...
Xuan Shen
Stratis Ioannidis
Wei Niu
Pu Zhao
Yanzhi Wang
32
9
0
27 Sep 2024
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling
Benjamin Clavié
Antoine Chaffin
Griffin Adams
28
2
0
23 Sep 2024
Patch Ranking: Efficient CLIP by Learning to Rank Local Patches
Cheng-En Wu
Jinhong Lin
Yu Hen Hu
Pedro Morgado
VLM
25
0
0
22 Sep 2024
Agglomerative Token Clustering
Joakim Bruslund Haurum
Sergio Escalera
Graham W. Taylor
T. Moeslund
36
1
0
18 Sep 2024
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models
Weihao Ye
Qiong Wu
Wenhao Lin
Yiyi Zhou
VLM
41
10
0
16 Sep 2024
Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving
Yunsheng Ma
Amr Abdelraouf
Rohit Gupta
Ziran Wang
Kyungtae Han
28
3
0
16 Sep 2024
Famba-V: Fast Vision Mamba with Cross-Layer Token Fusion
Hui Shen
Zhongwei Wan
Xin Wang
Mi Zhang
Mamba
32
6
0
15 Sep 2024
Token Turing Machines are Efficient Vision Models
Purvish Jajal
Nick Eliopoulos
Benjamin Shiue-Hal Chou
George K. Thiravathukal
James C. Davis
Yung-Hsiang Lu
98
0
0
11 Sep 2024
Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding
Xiaoyu Liang
Jiayuan Yu
Lianrui Mu
Jiedong Zhuang
Jiaqi Hu
Yuchen Yang
Jiangnan Ye
Lu Lu
Jian Chen
Haoji Hu
VLM
43
2
0
10 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
135
7
0
02 Sep 2024
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Dingyuan Zhang
Dingkang Liang
Zichang Tan
Xiaoqing Ye
Cheng Zhang
Jingdong Wang
Xiang Bai
ViT
51
2
0
01 Sep 2024
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure
Jia-Fong Yeh
Min-Hung Chen
Hung-Ting Su
Winston H. Hsu
Shang-Hong Lai
26
3
0
30 Aug 2024
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Shuai Peng
Di Fu
Baole Wei
Yong Cao
Liangcai Gao
Zhi Tang
ViT
42
1
0
30 Aug 2024
GlaLSTM: A Concurrent LSTM Stream Framework for Glaucoma Detection via Biomarker Mining
Cheng Huang
Weizheng Xie
Jian Zhou
Karanjit S Kooner
Karanjit Kooner
Yishen Liu
35
1
0
28 Aug 2024
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Guozhen Zhang
Jingyu Liu
Shengming Cao
Xiaotong Zhao
Kevin Zhao
Kai Ma
Limin Wang
ViT
29
1
0
13 Aug 2024
Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning
Shibo Jie
Yehui Tang
Jianyuan Guo
Zhi-Hong Deng
Kai Han
Yunhe Wang
VLM
38
2
0
13 Aug 2024
ZePo: Zero-Shot Portrait Stylization with Faster Sampling
Jin Liu
Huaibo Huang
Jie Cao
Ran He
DiffM
36
0
0
10 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLM
MLLM
66
19
0
04 Aug 2024
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
Wei Chen
Mahdieh Hatamian
Yu Wu
48
3
0
02 Aug 2024
Boosting Audio Visual Question Answering via Key Semantic-Aware Cues
Guangyao Li
Henghui Du
Di Hu
24
4
0
30 Jul 2024
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Gagan Jain
Nidhi Hegde
Aditya Kusupati
Arsha Nagrani
Shyamal Buch
Prateek Jain
Anurag Arnab
Sujoy Paul
MoE
45
7
0
29 Jul 2024
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering
Mahiro Ukai
Shuhei Kurita
Atsushi Hashimoto
Yoshitaka Ushiku
Nakamasa Inoue
18
0
0
28 Jul 2024
Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Zhijian Liu
Zhuoyang Zhang
Samir Khaki
Shang Yang
Haotian Tang
Chenfeng Xu
Kurt Keutzer
Song Han
SSeg
51
1
0
26 Jul 2024
DAM: Towards A Foundation Model for Time Series Forecasting
L. N. Darlow
Qiwen Deng
Ahmed Hassan
Martin Asenov
Rajkarn Singh
Artjom Joosen
Adam Barker
Amos Storkey
AI4TS
AI4CE
40
3
0
25 Jul 2024
Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
Hyunwoo Yu
Yubin Cho
Beoungwoo Kang
Seunghun Moon
Kyeongbo Kong
Suk-Ju Kang
30
3
0
24 Jul 2024
Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions
Kai-Chun Liu
Zhihang Fu
Chao Chen
Sheng Jin
Ze Chen
Mingyuan Tao
Rongxin Jiang
Jieping Ye
VLM
OODD
58
4
0
23 Jul 2024
Exploring The Neural Burden In Pruned Models: An Insight Inspired By Neuroscience
Zeyu Wang
Weichen Dai
Xiangyu Zhou
Ji Qi
Yi Zhou
46
0
0
23 Jul 2024
Efficient Visual Transformer by Learnable Token Merging
Yancheng Wang
Yingzhen Yang
ViT
44
1
0
21 Jul 2024
Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding
Renshan Zhang
Yibo Lyu
Rui Shao
Gongwei Chen
Weili Guan
Liqiang Nie
39
9
0
19 Jul 2024
Pose-guided multi-task video transformer for driver action recognition
Ricardo Pizarro
Roberto Valle
L. Bergasa
J. M. Buenaposada
Luis Baumela
ViT
37
0
0
18 Jul 2024
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Zhuguanyu Wu
Jiaxin Chen
Hanwen Zhong
Di Huang
Yun Wang
MQ
46
9
0
17 Jul 2024
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju
Haicheng Wang
Haozhe Cheng
Xu Chen
Zhonghua Zhai
Weilin Huang
Jinsong Lan
Shuai Xiao
Bo Zheng
VLM
49
5
0
16 Jul 2024
MaskVD: Region Masking for Efficient Video Object Detection
Sreetama Sarkar
Gourav Datta
Souvik Kundu
Kai Zheng
Chirayata Bhattacharyya
P. Beerel
25
3
0
16 Jul 2024
TCFormer: Visual Recognition via Token Clustering Transformer
Wang Zeng
Sheng Jin
Lumin Xu
Wentao Liu
Chao Qian
Wanli Ouyang
Ping Luo
Xiaogang Wang
33
3
0
16 Jul 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao
Xiaohan Ding
Juexiao Feng
Yuhong Yang
Hui Chen
Guiguang Ding
VLM
MQ
32
5
0
15 Jul 2024
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks
Zheng Wang
Boxiao Jin
Zhongzhi Yu
Minjia Zhang
MoMe
37
23
0
11 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
42
43
0
09 Jul 2024
PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference
Ye Li
Chen Tang
Yuan Meng
Jiajun Fan
Zenghao Chai
Xinzhu Ma
Zhi Wang
Wenwu Zhu
31
1
0
06 Jul 2024
TokenPacker: Efficient Visual Projector for Multimodal LLM
Wentong Li
Yuqian Yuan
Jian Liu
Dongqi Tang
Song Wang
Jie Qin
Jianke Zhu
Lei Zhang
MLLM
37
50
0
02 Jul 2024
Efficient Sparse Attention needs Adaptive Token Release
Chaoran Zhang
Lixin Zou
Dan Luo
Min Tang
Xiangyang Luo
Zihao Li
Chenliang Li
41
2
0
02 Jul 2024
Previous
1
2
3
4
5
6
7
Next