Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00112
Cited By
v1
v2
v3 (latest)
Transformer in Transformer
27 February 2021
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
Github (4228★)
Papers citing
"Transformer in Transformer"
50 / 558 papers shown
Title
Efficiency 360: Efficient Vision Transformers
Badri N. Patro
Vijay Srinivas Agneeswaran
163
6
0
16 Feb 2023
A Unified View of Long-Sequence Models towards Modeling Million-Scale Dependencies
Hongyu Hè
Marko Kabić
78
2
0
13 Feb 2023
IH-ViT: Vision Transformer-based Integrated Circuit Appear-ance Defect Detection
Xiaoibin Wang
Shuang Gao
Yuntao Zou
Jia Guo
Chu Wang
43
5
0
09 Feb 2023
PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Yawen Cui
Jiehua Zhang
Philip Torr
Guoying Zhao
ViT
MedIm
92
83
0
07 Feb 2023
CECT: Controllable Ensemble CNN and Transformer for COVID-19 Image Classification
Zhao Liu
Leizhao Shen
ViT
72
9
0
05 Feb 2023
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Jiayu Jiao
Yuyao Tang
Kun-Li Channing Lin
Yipeng Gao
Jinhua Ma
Yaowei Wang
Wei-Shi Zheng
MedIm
ViT
98
156
0
03 Feb 2023
Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization
Yingying Zhu
Hongji Yang
Yuxin Lu
Qiang Huang
56
35
0
03 Feb 2023
Robust Transformer with Locality Inductive Bias and Feature Normalization
Omid Nejati Manzari
Hossein Kashiani
Hojat Asgarian Dehkordi
S. B. Shokouhi
ViT
77
15
0
27 Jan 2023
Out of Distribution Performance of State of Art Vision Model
Salman Rahman
W. Lee
117
3
0
25 Jan 2023
SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation
Yueze Wang
Yongping Xiong
C. Chiu
Fangyu Liu
Xiangyang Gong
3DPC
ViT
66
6
0
17 Jan 2023
Skip-Attention: Improving Vision Transformers by Paying Less Attention
Shashanka Venkataramanan
Amir Ghodrati
Yuki M. Asano
Fatih Porikli
A. Habibian
ViT
111
30
0
05 Jan 2023
Explainability and Robustness of Deep Visual Classification Models
Jindong Gu
AAML
104
2
0
03 Jan 2023
Edge Enhanced Image Style Transfer via Transformers
Chi Zhang
Jun Yang
Zaiyan Dai
Peng-Xia Cao
60
10
0
02 Jan 2023
Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification
Ziyi Tang
Ruimao Zhang
Zhanglin Peng
Jinrui Chen
Liang Lin
88
18
0
02 Jan 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
178
53
0
31 Dec 2022
Transformer in Transformer as Backbone for Deep Reinforcement Learning
Hangyu Mao
Rui Zhao
Hao Chen
Jianye Hao
Yiqun Chen
Dong Li
Junge Zhang
Zhen Xiao
OffRL
93
8
0
30 Dec 2022
A Close Look at Spatial Modeling: From Attention to Convolution
Xu Ma
Huan Wang
Can Qin
Kunpeng Li
Xing Zhao
Jie Fu
Yun Fu
ViT
3DPC
66
12
0
23 Dec 2022
What Makes for Good Tokenizers in Vision Transformer?
Shengju Qian
Yi Zhu
Wenbo Li
Mu Li
Jiaya Jia
ViT
91
14
0
21 Dec 2022
DuAT: Dual-Aggregation Transformer Network for Medical Image Segmentation
Feilong Tang
Qingming Huang
Jinfeng Wang
Xianxu Hou
Jionglong Su
Jingxin Liu
ViT
MedIm
80
55
0
21 Dec 2022
Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation
Loic Themyr
Clément Rambour
Nicolas Thome
Toby Collins
Alexandre Hostettler
ViT
49
10
0
15 Dec 2022
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang
Jiarui Xu
Shalini De Mello
Elliot J. Crowley
Xinyu Wang
ViT
109
22
0
13 Dec 2022
CamoFormer: Masked Separable Attention for Camouflaged Object Detection
Bo Yin
Xuying Zhang
Qibin Hou
Bo Sun
Deng-Ping Fan
Luc Van Gool
104
59
0
10 Dec 2022
Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images
L. Ding
Jing Zhang
Kai Zhang
Haitao Guo
Bing Liu
Lorenzo Bruzzone
59
56
0
10 Dec 2022
Lightweight Structure-Aware Attention for Visual Understanding
Heeseung Kwon
F. M. Castro
M. Marín-Jiménez
N. Guil
Alahari Karteek
86
2
0
29 Nov 2022
A Light Touch Approach to Teaching Transformers Multi-view Geometry
Yash Bhalgat
Joao F. Henriques
Andrew Zisserman
ViT
104
6
0
28 Nov 2022
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Kashu Yamazaki
Khoa T. Vo
Sang Truong
Bhiksha Raj
Ngan Le
80
38
0
28 Nov 2022
Semantic-Aware Local-Global Vision Transformer
Jiatong Zhang
Zengwei Yao
Fanglin Chen
Guangming Lu
Wenjie Pei
ViT
54
0
0
27 Nov 2022
Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations
Tan Yu
Ping Li
ViT
81
5
0
25 Nov 2022
Spatial-Temporal Attention Network for Open-Set Fine-Grained Image Recognition
Qiulei Dong
Hong Wang
Qiulei Dong
3DPC
ViT
65
1
0
25 Nov 2022
AFR-Net: Attention-Driven Fingerprint Recognition Network
Steven A. Grosz
A.K. Jain
ViT
108
30
0
25 Nov 2022
GhostNetV2: Enhance Cheap Operation with Long-Range Attention
Yehui Tang
Kai Han
Jianyuan Guo
Chang Xu
Chaoting Xu
Yunhe Wang
94
297
0
23 Nov 2022
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Qibin Hou
Cheng Lu
Mingg-Ming Cheng
Jiashi Feng
ViT
128
141
0
22 Nov 2022
TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis
Yilan Zhang
Feng-ying Xie
Jianqing Chen
MedIm
56
36
0
21 Nov 2022
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Sifan Long
Z. Zhao
Jimin Pi
Sheng-sheng Wang
Jingdong Wang
95
39
0
21 Nov 2022
STGlow: A Flow-based Generative Framework with Dual Graphormer for Pedestrian Trajectory Prediction
Rongqin Liang
Yuanman Li
Jiantao Zhou
Xia Li
84
15
0
21 Nov 2022
Delving into Transformer for Incremental Semantic Segmentation
Zekai Xu
Mingying Zhang
Jiayue Hou
Xing Gong
Chuan Wen
Chengjie Wang
Junge Zhang
CLL
66
1
0
18 Nov 2022
Dynamic Temporal Filtering in Video Models
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Chong-Wah Ngo
Tao Mei
AI4TS
103
18
0
15 Nov 2022
Monocular BEV Perception of Road Scenes via Front-to-Top View Projection
Wenxi Liu
Qi Li
Weixiang Yang
Jiaxin Cai
Yuanlong Yu
Yuexin Ma
Shengfeng He
Jianxiong Pan
63
2
0
15 Nov 2022
HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers
Zhaoyang Han
Mengshu Sun
Alec Lu
Yanyue Xie
Li-Yu Daisy Liu
...
Xin Meng
Zechao Li
Xue Lin
Zhenman Fang
Yanzhi Wang
ViT
99
72
0
15 Nov 2022
Fcaformer: Forward Cross Attention in Hybrid Vision Transformer
Haokui Zhang
Wenze Hu
Xiaoyu Wang
ViT
63
8
0
14 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
180
700
0
10 Nov 2022
MogaNet: Multi-order Gated Aggregation Network
Siyuan Li
Zedong Wang
Zicheng Liu
Cheng Tan
Haitao Lin
Di Wu
Zhiyuan Chen
Jiangbin Zheng
Stan Z. Li
107
65
0
07 Nov 2022
Boosting Binary Neural Networks via Dynamic Thresholds Learning
Jiehua Zhang
Xueyang Zhang
Z. Su
Zitong Yu
Yanghe Feng
Xin Lu
M. Pietikäinen
Li Liu
MQ
103
0
0
04 Nov 2022
Relative Attention-based One-Class Adversarial Autoencoder for Continuous Authentication of Smartphone Users
Mingming Hu
Kun Zhang
Ruibang You
Bibo Tu
AAML
60
1
0
30 Oct 2022
TFormer: 3D Tooth Segmentation in Mesh Scans with Geometry Guided Transformer
Huimin Xiong
Kunle Li
K. Tan
Yang Feng
Qiufeng Wang
Jinxiang Hao
Zuo-Qiang Liu
MedIm
78
1
0
29 Oct 2022
Grafting Vision Transformers
Jong Sung Park
Kumara Kahatapitiya
Donghyun Kim
Shivchander Sudalairaj
Quanfu Fan
Michael S. Ryoo
ViT
97
3
0
28 Oct 2022
Fully-attentive and interpretable: vision and video vision transformers for pain detection
Giacomo Fiorentini
Itir Onal Ertugrul
A. A. Salah
MedIm
ViT
93
2
0
27 Oct 2022
M
3
^3
3
ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
MoE
75
88
0
26 Oct 2022
Automatic Diagnosis of Myocarditis Disease in Cardiac MRI Modality using Deep Transformers and Explainable Artificial Intelligence
M. Jafari
A. Shoeibi
Navid Ghassemi
Jónathan Heras
Saiguang Ling
...
Shuihua Wang
R. Alizadehsani
Juan M Gorriz
U. Acharya
Hamid Alinejad-Rokny
MedIm
98
11
0
26 Oct 2022
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
99
171
0
24 Oct 2022
Previous
1
2
3
...
5
6
7
...
10
11
12
Next