Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00112
Cited By
Transformer in Transformer
27 February 2021
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer in Transformer"
50 / 553 papers shown
Title
Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization
Yingying Zhu
Hongji Yang
Yuxin Lu
Qiang Huang
19
32
0
03 Feb 2023
Robust Transformer with Locality Inductive Bias and Feature Normalization
Omid Nejati Manzari
Hossein Kashiani
Hojat Asgarian Dehkordi
S. B. Shokouhi
ViT
24
14
0
27 Jan 2023
Out of Distribution Performance of State of Art Vision Model
Salman Rahman
W. Lee
40
2
0
25 Jan 2023
SAT: Size-Aware Transformer for 3D Point Cloud Semantic Segmentation
Yueze Wang
Yongping Xiong
C. Chiu
Fangyu Liu
Xiangyang Gong
3DPC
ViT
30
6
0
17 Jan 2023
Skip-Attention: Improving Vision Transformers by Paying Less Attention
Shashanka Venkataramanan
Amir Ghodrati
Yuki M. Asano
Fatih Porikli
A. Habibian
ViT
18
25
0
05 Jan 2023
Explainability and Robustness of Deep Visual Classification Models
Jindong Gu
AAML
44
2
0
03 Jan 2023
Edge Enhanced Image Style Transfer via Transformers
Chi Zhang
Jun Yang
Zaiyan Dai
Peng-Xia Cao
16
10
0
02 Jan 2023
Multi-Stage Spatio-Temporal Aggregation Transformer for Video Person Re-identification
Ziyi Tang
Ruimao Zhang
Zhanglin Peng
Jinrui Chen
Liang Lin
33
18
0
02 Jan 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
100
48
0
31 Dec 2022
Transformer in Transformer as Backbone for Deep Reinforcement Learning
Hangyu Mao
Rui Zhao
Hao Chen
Jianye Hao
Yiqun Chen
Dong Li
Junge Zhang
Zhen Xiao
OffRL
36
8
0
30 Dec 2022
A Close Look at Spatial Modeling: From Attention to Convolution
Xu Ma
Huan Wang
Can Qin
Kunpeng Li
Xing Zhao
Jie Fu
Yun Fu
ViT
3DPC
25
11
0
23 Dec 2022
What Makes for Good Tokenizers in Vision Transformer?
Shengju Qian
Yi Zhu
Wenbo Li
Mu Li
Jiaya Jia
ViT
37
14
0
21 Dec 2022
DuAT: Dual-Aggregation Transformer Network for Medical Image Segmentation
Feilong Tang
Qingming Huang
Jinfeng Wang
Xianxu Hou
Jionglong Su
Jingxin Liu
ViT
MedIm
32
49
0
21 Dec 2022
Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation
Loic Themyr
Clément Rambour
Nicolas Thome
Toby Collins
Alexandre Hostettler
ViT
27
10
0
15 Dec 2022
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang
Jiarui Xu
Shalini De Mello
Elliot J. Crowley
Xinyu Wang
ViT
38
21
0
13 Dec 2022
CamoFormer: Masked Separable Attention for Camouflaged Object Detection
Bo Yin
Xuying Zhang
Qibin Hou
Bo Sun
Deng-Ping Fan
Luc Van Gool
28
51
0
10 Dec 2022
Joint Spatio-Temporal Modeling for the Semantic Change Detection in Remote Sensing Images
L. Ding
Jing Zhang
Kai Zhang
Haitao Guo
Bing Liu
Lorenzo Bruzzone
29
48
0
10 Dec 2022
Lightweight Structure-Aware Attention for Visual Understanding
Heeseung Kwon
F. M. Castro
M. Marín-Jiménez
N. Guil
Alahari Karteek
28
2
0
29 Nov 2022
A Light Touch Approach to Teaching Transformers Multi-view Geometry
Yash Bhalgat
Joao F. Henriques
Andrew Zisserman
ViT
27
6
0
28 Nov 2022
VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Kashu Yamazaki
Khoa T. Vo
Sang Truong
Bhiksha Raj
Ngan Le
29
35
0
28 Nov 2022
Semantic-Aware Local-Global Vision Transformer
Jiatong Zhang
Zengwei Yao
Fanglin Chen
Guangming Lu
Wenjie Pei
ViT
25
0
0
27 Nov 2022
Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations
Tan Yu
Ping Li
ViT
46
5
0
25 Nov 2022
Spatial-Temporal Attention Network for Open-Set Fine-Grained Image Recognition
Jiaying Sun
Hong Wang
Qiulei Dong
3DPC
ViT
20
1
0
25 Nov 2022
AFR-Net: Attention-Driven Fingerprint Recognition Network
Steven A. Grosz
A.K. Jain
ViT
34
29
0
25 Nov 2022
GhostNetV2: Enhance Cheap Operation with Long-Range Attention
Yehui Tang
Kai Han
Jianyuan Guo
Chang Xu
Chaoting Xu
Yunhe Wang
20
270
0
23 Nov 2022
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Qibin Hou
Cheng Lu
Mingg-Ming Cheng
Jiashi Feng
ViT
34
129
0
22 Nov 2022
TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis
Yilan Zhang
Feng-ying Xie
Jianqing Chen
MedIm
28
32
0
21 Nov 2022
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Sifan Long
Z. Zhao
Jimin Pi
Sheng-sheng Wang
Jingdong Wang
27
29
0
21 Nov 2022
STGlow: A Flow-based Generative Framework with Dual Graphormer for Pedestrian Trajectory Prediction
Rongqin Liang
Yuanman Li
Jiantao Zhou
Xia Li
39
12
0
21 Nov 2022
Delving into Transformer for Incremental Semantic Segmentation
Zekai Xu
Mingying Zhang
Jiayue Hou
Xing Gong
Chuan Wen
Chengjie Wang
Junge Zhang
CLL
24
1
0
18 Nov 2022
Dynamic Temporal Filtering in Video Models
Fuchen Long
Zhaofan Qiu
Yingwei Pan
Ting Yao
Chong-Wah Ngo
Tao Mei
AI4TS
27
17
0
15 Nov 2022
Monocular BEV Perception of Road Scenes via Front-to-Top View Projection
Wenxi Liu
Qi Li
Weixiang Yang
Jiaxin Cai
Yuanlong Yu
Yuexin Ma
Shengfeng He
Jianxiong Pan
23
1
0
15 Nov 2022
HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers
Peiyan Dong
Mengshu Sun
Alec Lu
Yanyue Xie
Li-Yu Daisy Liu
...
Xin Meng
Zechao Li
Xue Lin
Zhenman Fang
Yanzhi Wang
ViT
34
59
0
15 Nov 2022
Fcaformer: Forward Cross Attention in Hybrid Vision Transformer
Haokui Zhang
Wenze Hu
Xiaoyu Wang
ViT
19
8
0
14 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
38
660
0
10 Nov 2022
MogaNet: Multi-order Gated Aggregation Network
Siyuan Li
Zedong Wang
Zicheng Liu
Cheng Tan
Haitao Lin
Di Wu
Zhiyuan Chen
Jiangbin Zheng
Stan Z. Li
26
55
0
07 Nov 2022
Boosting Binary Neural Networks via Dynamic Thresholds Learning
Jiehua Zhang
Xueyang Zhang
Z. Su
Zitong Yu
Yanghe Feng
Xin Lu
M. Pietikäinen
Li Liu
MQ
30
0
0
04 Nov 2022
Relative Attention-based One-Class Adversarial Autoencoder for Continuous Authentication of Smartphone Users
Mingming Hu
Kun Zhang
Ruibang You
Bibo Tu
AAML
22
1
0
30 Oct 2022
TFormer: 3D Tooth Segmentation in Mesh Scans with Geometry Guided Transformer
Huimin Xiong
Kunle Li
K. Tan
Yang Feng
Qiufeng Wang
Jinxiang Hao
Zuo-Qiang Liu
MedIm
40
1
0
29 Oct 2022
Grafting Vision Transformers
Jong Sung Park
Kumara Kahatapitiya
Donghyun Kim
Shivchander Sudalairaj
Quanfu Fan
Michael S. Ryoo
ViT
29
2
0
28 Oct 2022
Fully-attentive and interpretable: vision and video vision transformers for pain detection
Giacomo Fiorentini
Itir Onal Ertugrul
A. A. Salah
MedIm
ViT
21
2
0
27 Oct 2022
M
3
^3
3
ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
MoE
42
81
0
26 Oct 2022
Automatic Diagnosis of Myocarditis Disease in Cardiac MRI Modality using Deep Transformers and Explainable Artificial Intelligence
M. Jafari
A. Shoeibi
Navid Ghassemi
Jónathan Heras
Saiguang Ling
...
Shuihua Wang
R. Alizadehsani
Juan M Gorriz
U. Acharya
Hamid Alinejad-Rokny
MedIm
22
11
0
26 Oct 2022
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
40
156
0
24 Oct 2022
LCPFormer: Towards Effective 3D Point Cloud Analysis via Local Context Propagation in Transformers
Zhuo Huang
Zhiyou Zhao
Banghuai Li
Jungong Han
3DPC
ViT
35
55
0
23 Oct 2022
S2WAT: Image Style Transfer via Hierarchical Vision Transformer using Strips Window Attention
Chi Zhang
Lu Zhou
Lei Wang
Zaiyan Dai
Jun Yang
ViT
34
23
0
22 Oct 2022
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling
Dongsheng Chen
Chaofan Tao
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
VLM
29
18
0
21 Oct 2022
Boosting vision transformers for image retrieval
Chull Hwan Song
Jooyoung Yoon
Shunghyun Choi
Yannis Avrithis
ViT
34
32
0
21 Oct 2022
Similarity of Neural Architectures using Adversarial Attack Transferability
Jaehui Hwang
Dongyoon Han
Byeongho Heo
Song Park
Sanghyuk Chun
Jong-Seok Lee
AAML
32
1
0
20 Oct 2022
Rethinking Bias Mitigation: Fairer Architectures Make for Fairer Face Recognition
Samuel Dooley
R. Sukthanker
John P. Dickerson
Colin White
Frank Hutter
Micah Goldblum
CVBM
24
21
0
18 Oct 2022
Previous
1
2
3
...
5
6
7
...
10
11
12
Next