Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.11986
Cited By
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
28 January 2021
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet"
46 / 396 papers shown
Title
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
49
1,167
0
09 Jun 2021
Scaling Vision Transformers
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
ViT
70
1,060
0
08 Jun 2021
On the Connection between Local Attention and Dynamic Depth-wise Convolution
Qi Han
Zejia Fan
Qi Dai
Lei-huan Sun
Ming-Ming Cheng
Jiaying Liu
Jingdong Wang
ViT
24
105
0
08 Jun 2021
Fully Transformer Networks for Semantic Image Segmentation
Sitong Wu
Tianyi Wu
Fangjian Lin
Sheng Tian
Guodong Guo
ViT
34
39
0
08 Jun 2021
DoubleField: Bridging the Neural Surface and Radiance Fields for High-fidelity Human Reconstruction and Rendering
Ruizhi Shao
Hongwen Zhang
He Zhang
Mingjia Chen
Yan-Pei Cao
Tao Yu
Yebin Liu
3DH
17
64
0
07 Jun 2021
Reveal of Vision Transformers Robustness against Adversarial Attacks
Ahmed Aldahdooh
W. Hamidouche
Olivier Déforges
ViT
15
56
0
07 Jun 2021
Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos
Shaocheng Jia
Xin Pei
W. Yao
S. Wong
3DPC
MDE
43
19
0
07 Jun 2021
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
Yufei Xu
Qiming Zhang
Jing Zhang
Dacheng Tao
ViT
65
329
0
07 Jun 2021
Patch Slimming for Efficient Vision Transformers
Yehui Tang
Kai Han
Yunhe Wang
Chang Xu
Jianyuan Guo
Chao Xu
Dacheng Tao
ViT
21
163
0
05 Jun 2021
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
Enze Xie
Wenhai Wang
Zhiding Yu
Anima Anandkumar
J. Álvarez
Ping Luo
ViT
44
4,836
0
31 May 2021
Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model
Jiangning Zhang
Chao Xu
Jian Li
Wenzhou Chen
Yabiao Wang
Ying Tai
Shuo Chen
Chengjie Wang
Feiyue Huang
Yong Liu
32
22
0
31 May 2021
VidFace: A Full-Transformer Solver for Video FaceHallucination with Unaligned Tiny Snapshots
Y. Gan
Yawei Luo
Xin Yu
Bang Zhang
Yi Yang
ViT
CVBM
25
3
0
31 May 2021
Dual-stream Network for Visual Recognition
Mingyuan Mao
Renrui Zhang
Honghui Zheng
Peng Gao
Teli Ma
Yan Peng
Errui Ding
Baochang Zhang
Shumin Han
ViT
25
63
0
31 May 2021
ResT: An Efficient Transformer for Visual Recognition
Qing-Long Zhang
Yubin Yang
ViT
29
229
0
28 May 2021
KVT: k-NN Attention for Boosting Vision Transformers
Pichao Wang
Xue Wang
F. Wang
Ming Lin
Shuning Chang
Hao Li
R. L. Jin
ViT
51
105
0
28 May 2021
Intriguing Properties of Vision Transformers
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Munawar Hayat
Fahad Shahbaz Khan
Ming-Hsuan Yang
ViT
265
621
0
21 May 2021
Vision Transformers are Robust Learners
Sayak Paul
Pin-Yu Chen
ViT
28
304
0
17 May 2021
Is the aspect ratio of cells important in deep learning? A robust comparison of deep learning methods for multi-scale cytopathology cell image classification: from convolutional neural networks to visual transformers
Wanli Liu
Chen Li
M. Rahaman
Tao Jiang
Hongzan Sun
...
Weiming Hu
Hao Chen
Changhao Sun
Yudong Yao
M. Grzegorzek
41
54
0
16 May 2021
Conformer: Local Features Coupling Global Representations for Visual Recognition
Zhiliang Peng
Wei Huang
Shanzhi Gu
Lingxi Xie
Yaowei Wang
Jianbin Jiao
QiXiang Ye
ViT
21
527
0
09 May 2021
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks
Meng-Hao Guo
Zheng-Ning Liu
Tai-Jiang Mu
Shimin Hu
25
472
0
05 May 2021
Attention for Image Registration (AiR): an unsupervised Transformer approach
Zihao Wang
H. Delingette
ViT
MedIm
25
7
0
05 May 2021
Vision Transformers with Patch Diversification
Chengyue Gong
Dilin Wang
Meng Li
Vikas Chandra
Qiang Liu
ViT
42
62
0
26 Apr 2021
Visformer: The Vision-friendly Transformer
Zhengsu Chen
Lingxi Xie
Jianwei Niu
Xuefeng Liu
Longhui Wei
Qi Tian
ViT
120
209
0
26 Apr 2021
Visual Saliency Transformer
Nian Liu
Ni Zhang
Kaiyuan Wan
Ling Shao
Junwei Han
ViT
253
352
0
25 Apr 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
All Tokens Matter: Token Labeling for Training Better Vision Transformers
Zihang Jiang
Qibin Hou
Li-xin Yuan
Daquan Zhou
Yujun Shi
Xiaojie Jin
Anran Wang
Jiashi Feng
ViT
25
203
0
22 Apr 2021
Escaping the Big Data Paradigm with Compact Transformers
Ali Hassani
Steven Walton
Nikhil Shah
Abulikemu Abuduweili
Jiachen Li
Humphrey Shi
59
462
0
12 Apr 2021
AST: Audio Spectrogram Transformer
Yuan Gong
Yu-An Chung
James R. Glass
ViT
28
830
0
05 Apr 2021
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
27
986
0
31 Mar 2021
CvT: Introducing Convolutions to Vision Transformers
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
54
1,876
0
29 Mar 2021
On the Adversarial Robustness of Vision Transformers
Rulin Shao
Zhouxing Shi
Jinfeng Yi
Pin-Yu Chen
Cho-Jui Hsieh
ViT
33
137
0
29 Mar 2021
TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization
Wei Gao
Fang Wan
Xingjia Pan
Zhiliang Peng
Qi Tian
Zhenjun Han
Bolei Zhou
QiXiang Ye
ViT
WSOL
30
198
0
27 Mar 2021
Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation
Wenhao Li
Hong Liu
Runwei Ding
Mengyuan Liu
Pichao Wang
Wenming Yang
ViT
25
189
0
26 Mar 2021
BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search
Changlin Li
Tao Tang
Guangrun Wang
Jiefeng Peng
Bing Wang
Xiaodan Liang
Xiaojun Chang
ViT
46
105
0
23 Mar 2021
DeepViT: Towards Deeper Vision Transformer
Daquan Zhou
Bingyi Kang
Xiaojie Jin
Linjie Yang
Xiaochen Lian
Zihang Jiang
Qibin Hou
Jiashi Feng
ViT
42
510
0
22 Mar 2021
Incorporating Convolution Designs into Visual Transformers
Kun Yuan
Shaopeng Guo
Ziwei Liu
Aojun Zhou
F. Yu
Wei Wu
ViT
56
467
0
22 Mar 2021
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
Stéphane dÁscoli
Hugo Touvron
Matthew L. Leavitt
Ari S. Morcos
Giulio Biroli
Levent Sagun
ViT
58
805
0
19 Mar 2021
Scalable Vision Transformers with Hierarchical Pooling
Zizheng Pan
Bohan Zhuang
Jing Liu
Haoyu He
Jianfei Cai
ViT
27
126
0
19 Mar 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
289
1,524
0
27 Feb 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
227
2,430
0
04 Jan 2021
Concept Generalization in Visual Representation Learning
Mert Bulent Sariyildiz
Yannis Kalantidis
Diane Larlus
Alahari Karteek
SSL
28
50
0
10 Dec 2020
A Simple Baseline for Pose Tracking in Videos of Crowded Scenes
Li Yuan
Shuning Chang
Ziyuan Huang
Yichen Zhou
Yupeng Chen
Xuecheng Nie
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
VOT
35
4
0
16 Oct 2020
Knowledge Enhanced Contextual Word Representations
Matthew E. Peters
Mark Neumann
IV RobertL.Logan
Roy Schwartz
Vidur Joshi
Sameer Singh
Noah A. Smith
234
656
0
09 Sep 2019
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
950
20,572
0
17 Apr 2017
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,220
0
16 Nov 2016
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
281
31,267
0
16 Jan 2013
Previous
1
2
3
4
5
6
7
8