Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.11986
Cited By
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
28 January 2021
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet"
50 / 396 papers shown
Title
MVT: Multi-view Vision Transformer for 3D Object Recognition
Shuo Chen
Tan Yu
Ping Li
ViT
37
43
0
25 Oct 2021
SOFT: Softmax-free Transformer with Linear Complexity
Jiachen Lu
Jinghan Yao
Junge Zhang
Martin Danelljan
Hang Xu
Weiguo Gao
Chunjing Xu
Thomas B. Schon
Li Zhang
18
161
0
22 Oct 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
38
268
0
19 Oct 2021
HRFormer: High-Resolution Transformer for Dense Prediction
Yuhui Yuan
Rao Fu
Lang Huang
Weihong Lin
Chao Zhang
Xilin Chen
Jingdong Wang
ViT
38
227
0
18 Oct 2021
ASFormer: Transformer for Action Segmentation
Fangqiu Yi
Hongyu Wen
Tingting Jiang
ViT
79
172
0
16 Oct 2021
Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types
Shentong Mo
Xiao Fu
Chenyang Hong
Yizhen Chen
Yuxuan Zheng
Xiangru Tang
Zhiqiang Shen
Eric P. Xing
Yanyan Lan
AI4CE
23
19
0
11 Oct 2021
Global Vision Transformer Pruning with Hessian-Aware Saliency
Huanrui Yang
Hongxu Yin
Maying Shen
Pavlo Molchanov
Hai Helen Li
Jan Kautz
ViT
30
39
0
10 Oct 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Philipp Benz
Soomin Ham
Chaoning Zhang
Adil Karjauv
In So Kweon
AAML
ViT
47
78
0
06 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
218
1,213
0
05 Oct 2021
ResNet strikes back: An improved training procedure in timm
Ross Wightman
Hugo Touvron
Hervé Jégou
AI4TS
212
487
0
01 Oct 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
114
20
0
29 Sep 2021
PnP-DETR: Towards Efficient Visual Analysis with Transformers
Tao Wang
Li Yuan
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
ViT
24
82
0
15 Sep 2021
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation
Tongkun Xu
Weihua Chen
Pichao Wang
Fan Wang
Hao Li
R. L. Jin
ViT
59
215
0
13 Sep 2021
Compute and Energy Consumption Trends in Deep Learning Inference
Radosvet Desislavov
Fernando Martínez-Plumed
José Hernández-Orallo
35
113
0
12 Sep 2021
Scaled ReLU Matters for Training Vision Transformers
Pichao Wang
Xue Wang
Haowen Luo
Jingkai Zhou
Zhipeng Zhou
Fan Wang
Hao Li
R. L. Jin
19
41
0
08 Sep 2021
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
R. Liu
Hanming Deng
Yangyi Huang
Xiaoyu Shi
Lewei Lu
Wenxiu Sun
Xiaogang Wang
Jifeng Dai
Hongsheng Li
ViT
27
124
0
07 Sep 2021
Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation
Ziniu Wan
Zhengjia Li
Maoqing Tian
Jianbo Liu
Shuai Yi
Hongsheng Li
3DH
35
80
0
06 Sep 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
46
105
0
30 Aug 2021
Do Vision Transformers See Like Convolutional Neural Networks?
M. Raghu
Thomas Unterthiner
Simon Kornblith
Chiyuan Zhang
Alexey Dosovitskiy
ViT
67
924
0
19 Aug 2021
Causal Attention for Unbiased Visual Recognition
Tan Wang
Chan Zhou
Qianru Sun
Hanwang Zhang
OOD
CML
32
108
0
19 Aug 2021
Mobile-Former: Bridging MobileNet and Transformer
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
183
476
0
12 Aug 2021
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Yuki Tatsunami
Masato Taki
27
12
0
09 Aug 2021
TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
Zhengyi Liu
Yuan Wang
Zhengzheng Tu
Yun Xiao
Bin Tang
ViT
32
142
0
09 Aug 2021
Armour: Generalizable Compact Self-Attention for Vision Transformers
Lingchuan Meng
ViT
21
3
0
03 Aug 2021
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Yifan Xu
Zhijie Zhang
Mengdan Zhang
Kekai Sheng
Ke Li
Weiming Dong
Liqing Zhang
Changsheng Xu
Xing Sun
ViT
32
201
0
03 Aug 2021
Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer
Junyuan Gao
Maoguo Gong
Xuelong Li
ViT
19
46
0
02 Aug 2021
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
Wenxiao Wang
Lulian Yao
Long Chen
Binbin Lin
Deng Cai
Xiaofei He
Wei Liu
32
257
0
31 Jul 2021
DPT: Deformable Patch-based Transformer for Visual Recognition
Zhiyang Chen
Yousong Zhu
Chaoyang Zhao
Guosheng Hu
Wei Zeng
Jinqiao Wang
Ming Tang
ViT
16
98
0
30 Jul 2021
Query2Label: A Simple Transformer Way to Multi-Label Classification
Shilong Liu
Lei Zhang
Xiao Yang
Hang Su
Jun Zhu
24
187
0
22 Jul 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
33
231
0
21 Jul 2021
All the attention you need: Global-local, spatial-channel attention for image retrieval
Chull Hwan Song
Hye Joo Han
Yannis Avrithis
16
39
0
16 Jul 2021
A Comparative Study of Deep Learning Classification Methods on a Small Environmental Microorganism Image Dataset (EMDS-6): from Convolutional Neural Networks to Visual Transformers
Penghui Zhao
Chen Li
M. Rahaman
Hao Xu
Hechen Yang
Hongzan Sun
Tao Jiang
M. Grzegorzek
VLM
30
39
0
16 Jul 2021
Visual Parser: Representing Part-whole Hierarchies with Transformers
Shuyang Sun
Xiaoyu Yue
S. Bai
Philip Torr
50
27
0
13 Jul 2021
Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition
Mouath Aouayeb
W. Hamidouche
Catherine Soladié
K. Kpalma
Renaud Séguier
ViT
28
57
0
07 Jul 2021
Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation
Zhiwei Hao
Jianyuan Guo
Ding Jia
Kai Han
Yehui Tang
Chao Zhang
Dacheng Tao
Yunhe Wang
ViT
33
68
0
03 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
36
259
0
01 Jul 2021
Global Filter Networks for Image Classification
Yongming Rao
Wenliang Zhao
Zheng Zhu
Jiwen Lu
Jie Zhou
ViT
28
450
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
42
428
0
01 Jul 2021
Augmented Shortcuts for Vision Transformers
Yehui Tang
Kai Han
Chang Xu
An Xiao
Yiping Deng
Chao Xu
Yunhe Wang
ViT
14
39
0
30 Jun 2021
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
45
26
0
28 Jun 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
50
1,607
0
25 Jun 2021
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
52
314
0
24 Jun 2021
IA-RED
2
^2
2
: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Yikang Shen
Yi Ding
Zhangyang Wang
Rogerio Feris
A. Oliva
VLM
ViT
39
153
0
23 Jun 2021
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition
Qibin Hou
Zihang Jiang
Li-xin Yuan
Mingg-Ming Cheng
Shuicheng Yan
Jiashi Feng
ViT
MLLM
24
205
0
23 Jun 2021
P2T: Pyramid Pooling Transformer for Scene Understanding
Yu-Huan Wu
Yun-Hai Liu
Xin Zhan
Mingg-Ming Cheng
ViT
29
219
0
22 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
37
127
0
21 Jun 2021
XCiT: Cross-Covariance Image Transformers
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
ViT
42
499
0
17 Jun 2021
Space-time Mixing Attention for Video Transformer
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
30
124
0
10 Jun 2021
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
17
575
0
10 Jun 2021
CAT: Cross Attention in Vision Transformer
Hezheng Lin
Xingyi Cheng
Xiangyu Wu
Fan Yang
Dong Shen
Zhongyuan Wang
Qing Song
Wei Yuan
ViT
32
149
0
10 Jun 2021
Previous
1
2
3
4
5
6
7
8
Next