Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.15668
Cited By
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
30 November 2021
Lingchen Meng
Hengduo Li
Bor-Chun Chen
Shiyi Lan
Zuxuan Wu
Yu-Gang Jiang
Ser-Nam Lim
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AdaViT: Adaptive Vision Transformers for Efficient Image Recognition"
44 / 44 papers shown
Title
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
Wenyuan Xu
Shibiao Xu
ViT
169
0
0
06 May 2025
Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches
Guodong Shen
Yuqi Ouyang
Junru Lu
Yixuan Yang
Victor Sanchez
38
1
0
20 Apr 2025
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Kai Wang
Hao Luo
Yibing Song
Gao Huang
Fan Wang
Yang You
74
0
0
09 Apr 2025
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
Hao Wang
Shuo Zhang
Biao Leng
ViT
82
0
0
03 Apr 2025
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu
Khoi Duc Nguyen
Preeti Mukherjee
Saurabh Bagchi
Somali Chaterji
Yingyu Liang
Yin Li
LRM
49
1
0
13 Mar 2025
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo
Yingying Zhang
Xiaoyu Yang
Kang Wu
Qi Zhu
Lei Liang
Jingdong Chen
Yansheng Li
67
1
0
10 Mar 2025
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
Jiuyang Dong
Junjun Jiang
Kui Jiang
Jiahan Li
Yongbing Zhang
45
0
0
28 Feb 2025
MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks
Yifei Liu
Zhihang Zhong
Yifan Zhan
Sheng Xu
Xiao Sun
3DGS
55
3
0
29 Dec 2024
Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs
Qizhe Zhang
Aosong Cheng
Ming Lu
Zhiyong Zhuo
Minqi Wang
Jiajun Cao
Shaobo Guo
Qi She
Shanghang Zhang
VLM
92
11
0
02 Dec 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
135
7
0
02 Sep 2024
FastAST: Accelerating Audio Spectrogram Transformer via Token Merging and Cross-Model Knowledge Distillation
Swarup Ranjan Behera
Abhishek Dhiman
Karthik Gowda
Aalekhya Satya Narayani
26
1
0
11 Jun 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
44
8
0
25 May 2024
LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth Limited Optical Signal Acquisition
Lingfeng Liu
Dong Ni
Hangjie Yuan
ViT
35
0
0
03 Mar 2024
A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends
Abolfazl Younesi
Mohsen Ansari
Mohammadamin Fazli
A. Ejlali
Muhammad Shafique
Joerg Henkel
3DV
50
44
0
23 Feb 2024
DeSparsify: Adversarial Attack Against Token Sparsification Mechanisms in Vision Transformers
Oryan Yehezkel
Alon Zolfi
Amit Baras
Yuval Elovici
A. Shabtai
AAML
35
0
0
04 Feb 2024
Adaptive Depth Networks with Skippable Sub-Paths
Woochul Kang
33
1
0
27 Dec 2023
Improved TokenPose with Sparsity
Anning Li
ViT
34
0
0
16 Nov 2023
ProPainter: Improving Propagation and Transformer for Video Inpainting
Shangchen Zhou
Chongyi Li
Kelvin C. K. Chan
Chen Change Loy
ViT
35
90
0
07 Sep 2023
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
40
14
0
20 Jun 2023
Vision Transformers for Mobile Applications: A Short Survey
Nahid Alam
Steven Kolawole
S. Sethi
Nishant Bansali
Karina Nguyen
ViT
31
3
0
30 May 2023
Do We Really Need a Large Number of Visual Prompts?
Youngeun Kim
Yuhang Li
Abhishek Moitra
Ruokai Yin
Priyadarshini Panda
VLM
VPVLM
43
5
0
26 May 2023
Deep Neural Networks in Video Human Action Recognition: A Review
Zihan Wang
Yang Yang
Zhi Liu
Y. Zheng
56
4
0
25 May 2023
AutoFocusFormer: Image Segmentation off the Grid
Chen Ziwen
K. Patnaik
Shuangfei Zhai
Alvin Wan
Zhile Ren
A. Schwing
Alex Colburn
Li Fuxin
24
9
0
24 Apr 2023
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
Jiahao Wang
Songyang Zhang
Yong Liu
Taiqiang Wu
Yujiu Yang
Xihui Liu
Kai-xiang Chen
Ping Luo
Dahua Lin
36
20
0
12 Apr 2023
SVT: Supertoken Video Transformer for Efficient Video Understanding
Chen-Ming Pan
Rui Hou
Hanchao Yu
Qifan Wang
Senem Velipasalar
Madian Khabsa
ViT
26
0
0
01 Apr 2023
Vision Transformers with Mixed-Resolution Tokenization
Tomer Ronen
Omer Levy
A. Golbert
ViT
11
21
0
01 Apr 2023
Efficient Transformer-based 3D Object Detection with Dynamic Token Halting
Mao Ye
Gregory P. Meyer
Yuning Chai
Qiang Liu
32
8
0
09 Mar 2023
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue
Valerii Likhosherstov
Anurag Arnab
N. Houlsby
Mostafa Dehghani
Yang You
31
18
0
30 Jan 2023
GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer
Miao Yin
Burak Uzkent
Yilin Shen
Hongxia Jin
Bo Yuan
ViT
29
13
0
13 Jan 2023
Guided Hybrid Quantization for Object detection in Multimodal Remote Sensing Imagery via One-to-one Self-teaching
Jiaqing Zhang
Jie Lei
Weiying Xie
Yunsong Li
Wenxuan Wang
MQ
27
18
0
31 Dec 2022
Prototypical Residual Networks for Anomaly Detection and Localization
H. Zhang
Zuxuan Wu
Zihan Wang
Zhineng Chen
Yuwei Jiang
UQCV
AI4TS
35
62
0
05 Dec 2022
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Sifan Long
Z. Zhao
Jimin Pi
Sheng-sheng Wang
Jingdong Wang
22
29
0
21 Nov 2022
TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer
Zhiyang Dou
Qingxuan Wu
Chu-Hsing Lin
Zeyu Cao
Qiangqiang Wu
Weilin Wan
Taku Komura
Wenping Wang
24
39
0
19 Nov 2022
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
51
417
0
17 Oct 2022
PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation
Haoyu Ma
Zhe Wang
Yifei Chen
Deying Kong
Liangjian Chen
Xingwei Liu
Xiangyi Yan
Hao Tang
Xiaohui Xie
ViT
35
47
0
16 Sep 2022
EfficientFormer: Vision Transformers at MobileNet Speed
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
23
347
0
02 Jun 2022
ObjectFormer for Image Manipulation Detection and Localization
Junke Wang
Zuxuan Wu
Jingjing Chen
Xintong Han
Abhinav Shrivastava
Ser-Nam Lim
Yu-Gang Jiang
37
108
0
28 Mar 2022
Focal Modulation Networks
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
33
263
0
22 Mar 2022
BEVT: BERT Pretraining of Video Transformers
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Yu-Gang Jiang
Luowei Zhou
Lu Yuan
ViT
39
203
0
02 Dec 2021
VidTr: Video Transformer Without Convolutions
Yanyi Zhang
Xinyu Li
Chunhui Liu
Bing Shuai
Yi Zhu
Biagio Brattoli
Hao Chen
I. Marsic
Joseph Tighe
ViT
136
193
0
23 Apr 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
289
1,524
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
307
3,625
0
24 Feb 2021
TransReID: Transformer-based Object Re-Identification
Shuting He
Haowen Luo
Pichao Wang
F. Wang
Hao Li
Wei Jiang
ViT
215
796
0
08 Feb 2021
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
950
20,572
0
17 Apr 2017
1