Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00112
Cited By
v1
v2
v3 (latest)
Transformer in Transformer
27 February 2021
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (4228★)
Papers citing
"Transformer in Transformer"
50 / 558 papers shown
Title
Adaptive Token Sampling For Efficient Vision Transformers
Mohsen Fayyaz
Soroush Abbasi Koohpayegani
F. Jafari
Sunando Sengupta
Hamid Reza Vaezi Joze
Eric Sommerlade
Hamed Pirsiavash
Juergen Gall
ViT
135
160
0
30 Nov 2021
Shunted Self-Attention via Multi-Scale Token Aggregation
Sucheng Ren
Daquan Zhou
Shengfeng He
Jiashi Feng
Xinchao Wang
ViT
93
231
0
30 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
74
6
0
26 Nov 2021
Global Interaction Modelling in Vision Transformer via Super Tokens
Ammarah Farooq
Muhammad Awais
S. Ahmed
J. Kittler
ViT
59
7
0
25 Nov 2021
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
Wenhao Li
Hong Liu
Hao Tang
Pichao Wang
Luc Van Gool
ViT
147
254
0
24 Nov 2021
Self-slimmed Vision Transformer
Zhuofan Zong
Kunchang Li
Guanglu Song
Yali Wang
Yu Qiao
B. Leng
Yu Liu
ViT
108
32
0
24 Nov 2021
An Image Patch is a Wave: Phase-Aware Vision MLP
Yehui Tang
Kai Han
Jianyuan Guo
Chang Xu
Yanxi Li
Chao Xu
Yunhe Wang
115
136
0
24 Nov 2021
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Philip Torr
Guoying Zhao
ViT
MedIm
213
174
0
23 Nov 2021
MetaFormer Is Actually What You Need for Vision
Weihao Yu
Mi Luo
Pan Zhou
Chenyang Si
Yichen Zhou
Xinchao Wang
Jiashi Feng
Shuicheng Yan
181
928
0
22 Nov 2021
PointMixer: MLP-Mixer for Point Cloud Understanding
Jaesung Choe
Chunghyun Park
François Rameau
Jaesik Park
In So Kweon
3DPC
128
102
0
22 Nov 2021
DuDoTrans: Dual-Domain Transformer Provides More Attention for Sinogram Restoration in Sparse-View CT Reconstruction
Ce Wang
Kun Shang
Haimiao Zhang
Qian Li
Yuan Hui
S. Kevin Zhou
ViT
MedIm
78
28
0
21 Nov 2021
Are Vision Transformers Robust to Patch Perturbations?
Jindong Gu
Volker Tresp
Yao Qin
AAML
ViT
115
65
0
20 Nov 2021
INTERN: A New Learning Paradigm Towards General Vision
Jing Shao
Siyu Chen
Yangguang Li
Kun Wang
Zhen-fei Yin
...
F. Yu
Junjie Yan
Dahua Lin
Xiaogang Wang
Yu Qiao
110
34
0
16 Nov 2021
Attention Mechanisms in Computer Vision: A Survey
Meng-Hao Guo
Tianhan Xu
Jiangjiang Liu
Zheng-Ning Liu
Peng-Tao Jiang
Tai-Jiang Mu
Song-Hai Zhang
Ralph Robert Martin
Ming-Ming Cheng
Shimin Hu
144
1,745
0
15 Nov 2021
Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Huaijin Pi
Huiyu Wang
Yingwei Li
Zizhang Li
Alan Yuille
ViT
81
3
0
15 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
201
356
0
11 Nov 2021
Sliced Recursive Transformer
Zhiqiang Shen
Zechun Liu
Eric P. Xing
ViT
61
27
0
09 Nov 2021
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Ruiyang Liu
Hai-Tao Zheng
Li Tao
Dun Liang
Haitao Zheng
217
100
0
07 Nov 2021
Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems
Wenqing Zheng
Qiangqiang Guo
H. Yang
Peihao Wang
Zhangyang Wang
AI4CE
46
12
0
29 Oct 2021
3D Object Tracking with Transformer
Yubo Cui
Zheng Fang
Jiayao Shan
Zuoxu Gu
Sifan Zhou
ViT
3DPC
69
61
0
28 Oct 2021
MVT: Multi-view Vision Transformer for 3D Object Recognition
Shuo Chen
Tan Yu
Ping Li
ViT
69
46
0
25 Oct 2021
HRFormer: High-Resolution Transformer for Dense Prediction
Yuhui Yuan
Rao Fu
Lang Huang
Weihong Lin
Chao Zhang
Xilin Chen
Jingdong Wang
ViT
149
236
0
18 Oct 2021
SpecTNT: a Time-Frequency Transformer for Music Audio
Weiyi Lu
Ju-Chiang Wang
Minz Won
Keunwoo Choi
Xuchen Song
ViT
62
46
0
18 Oct 2021
StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning
Jinghuan Shang
Kumara Kahatapitiya
Xiang Li
Michael S. Ryoo
OffRL
100
36
0
12 Oct 2021
Global Vision Transformer Pruning with Hessian-Aware Saliency
Huanrui Yang
Hongxu Yin
Maying Shen
Pavlo Molchanov
Hai Helen Li
Jan Kautz
ViT
86
45
0
10 Oct 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Philipp Benz
Soomin Ham
Chaoning Zhang
Adil Karjauv
In So Kweon
AAML
ViT
102
80
0
06 Oct 2021
Implicit and Explicit Attention for Zero-Shot Learning
Faisal Alamri
Anjan Dutta
140
7
0
02 Oct 2021
Seeking an Optimal Approach for Computer-Aided Pulmonary Embolism Detection
N. Islam
S. Gehlot
Zongwei Zhou
Michael B. Gotway
Jianming Liang
OOD
159
11
0
15 Sep 2021
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation
Tongkun Xu
Weihua Chen
Pichao Wang
Fan Wang
Hao Li
Rong Jin
ViT
163
221
0
13 Sep 2021
Towards Transferable Adversarial Attacks on Vision Transformers
Zhipeng Wei
Jingjing Chen
Micah Goldblum
Zuxuan Wu
Tom Goldstein
Yu-Gang Jiang
ViT
AAML
106
124
0
09 Sep 2021
Scaled ReLU Matters for Training Vision Transformers
Pichao Wang
Xue Wang
Haowen Luo
Jingkai Zhou
Zhipeng Zhou
Fan Wang
Hao Li
Rong Jin
109
43
0
08 Sep 2021
Ultra-high Resolution Image Segmentation via Locality-aware Context Fusion and Alternating Local Enhancement
Wenxi Liu
Qi Li
Xin Lin
Weixiang Yang
Shengfeng He
Yuanlong Yu
78
8
0
06 Sep 2021
Searching for Efficient Multi-Stage Vision Transformers
Yi-Lun Liao
S. Karaman
Vivienne Sze
ViT
78
19
0
01 Sep 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
81
105
0
30 Aug 2021
Exploring and Improving Mobile Level Vision Transformers
Pengguang Chen
Yixin Chen
Shu Liu
Ming-Hsuan Yang
Jiaya Jia
ViT
104
4
0
30 Aug 2021
Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net
Yu Qiu
Yun-Hai Liu
Le Zhang
Jing Xu
ViT
71
31
0
17 Aug 2021
Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers
B. Dong
Wenhai Wang
Deng-Ping Fan
Jinpeng Li
Huazhu Fu
Ling Shao
ViT
MedIm
97
331
0
16 Aug 2021
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Yuki Tatsunami
Masato Taki
84
12
0
09 Aug 2021
S
2
^2
2
-MLPv2: Improved Spatial-Shift MLP Architecture for Vision
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
75
54
0
02 Aug 2021
Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer
Junyuan Gao
Maoguo Gong
Xuelong Li
ViT
98
47
0
02 Aug 2021
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
Wenxiao Wang
Lulian Yao
Long Chen
Binbin Lin
Deng Cai
Xiaofei He
Wei Liu
212
273
0
31 Jul 2021
Multi-Head Self-Attention via Vision Transformer for Zero-Shot Learning
Faisal Alamri
Anjan Dutta
ViT
36
23
0
30 Jul 2021
DPT: Deformable Patch-based Transformer for Visual Recognition
Zhiyang Chen
Yousong Zhu
Chaoyang Zhao
Guosheng Hu
Wei Zeng
Jinqiao Wang
Ming Tang
ViT
70
101
0
30 Jul 2021
Contextual Transformer Networks for Visual Recognition
Yehao Li
Ting Yao
Yingwei Pan
Tao Mei
ViT
108
494
0
26 Jul 2021
AS-MLP: An Axial Shifted MLP Architecture for Vision
Dongze Lian
Zehao Yu
Xing Sun
Shenghua Gao
124
192
0
18 Jul 2021
Visual Parser: Representing Part-whole Hierarchies with Transformers
Shuyang Sun
Xiaoyu Yue
S. Bai
Philip Torr
128
27
0
13 Jul 2021
What Makes for Hierarchical Vision Transformer?
Yuxin Fang
Xinggang Wang
Rui Wu
Wenyu Liu
ViT
56
9
0
05 Jul 2021
Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation
Zhiwei Hao
Jianyuan Guo
Ding Jia
Kai Han
Yehui Tang
Chao Zhang
Dacheng Tao
Yunhe Wang
ViT
142
73
0
03 Jul 2021
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
195
997
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
88
437
0
01 Jul 2021
Previous
1
2
3
...
10
11
12
9
Next