ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.11986
  4. Cited By
Tokens-to-Token ViT: Training Vision Transformers from Scratch on
  ImageNet

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

28 January 2021
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
    ViT
ArXivPDFHTML

Papers citing "Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet"

50 / 396 papers shown
Title
MVT: Multi-view Vision Transformer for 3D Object Recognition
MVT: Multi-view Vision Transformer for 3D Object Recognition
Shuo Chen
Tan Yu
Ping Li
ViT
37
43
0
25 Oct 2021
SOFT: Softmax-free Transformer with Linear Complexity
SOFT: Softmax-free Transformer with Linear Complexity
Jiachen Lu
Jinghan Yao
Junge Zhang
Martin Danelljan
Hang Xu
Weiguo Gao
Chunjing Xu
Thomas B. Schon
Li Zhang
18
161
0
22 Oct 2021
SSAST: Self-Supervised Audio Spectrogram Transformer
SSAST: Self-Supervised Audio Spectrogram Transformer
Yuan Gong
Cheng-I Jeff Lai
Yu-An Chung
James R. Glass
ViT
38
268
0
19 Oct 2021
HRFormer: High-Resolution Transformer for Dense Prediction
HRFormer: High-Resolution Transformer for Dense Prediction
Yuhui Yuan
Rao Fu
Lang Huang
Weihong Lin
Chao Zhang
Xilin Chen
Jingdong Wang
ViT
38
227
0
18 Oct 2021
ASFormer: Transformer for Action Segmentation
ASFormer: Transformer for Action Segmentation
Fangqiu Yi
Hongyu Wen
Tingting Jiang
ViT
79
172
0
16 Oct 2021
Multi-modal Self-supervised Pre-training for Regulatory Genome Across
  Cell Types
Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types
Shentong Mo
Xiao Fu
Chenyang Hong
Yizhen Chen
Yuxuan Zheng
Xiangru Tang
Zhiqiang Shen
Eric P. Xing
Yanyan Lan
AI4CE
23
19
0
11 Oct 2021
Global Vision Transformer Pruning with Hessian-Aware Saliency
Global Vision Transformer Pruning with Hessian-Aware Saliency
Huanrui Yang
Hongxu Yin
Maying Shen
Pavlo Molchanov
Hai Helen Li
Jan Kautz
ViT
30
39
0
10 Oct 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to
  CNNs
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Philipp Benz
Soomin Ham
Chaoning Zhang
Adil Karjauv
In So Kweon
AAML
ViT
47
78
0
06 Oct 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision
  Transformer
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
218
1,213
0
05 Oct 2021
ResNet strikes back: An improved training procedure in timm
ResNet strikes back: An improved training procedure in timm
Ross Wightman
Hugo Touvron
Hervé Jégou
AI4TS
212
487
0
01 Oct 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
114
20
0
29 Sep 2021
PnP-DETR: Towards Efficient Visual Analysis with Transformers
PnP-DETR: Towards Efficient Visual Analysis with Transformers
Tao Wang
Li Yuan
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
ViT
24
82
0
15 Sep 2021
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation
CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation
Tongkun Xu
Weihua Chen
Pichao Wang
Fan Wang
Hao Li
R. L. Jin
ViT
59
215
0
13 Sep 2021
Compute and Energy Consumption Trends in Deep Learning Inference
Compute and Energy Consumption Trends in Deep Learning Inference
Radosvet Desislavov
Fernando Martínez-Plumed
José Hernández-Orallo
35
113
0
12 Sep 2021
Scaled ReLU Matters for Training Vision Transformers
Scaled ReLU Matters for Training Vision Transformers
Pichao Wang
Xue Wang
Haowen Luo
Jingkai Zhou
Zhipeng Zhou
Fan Wang
Hao Li
R. L. Jin
19
41
0
08 Sep 2021
FuseFormer: Fusing Fine-Grained Information in Transformers for Video
  Inpainting
FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting
R. Liu
Hanming Deng
Yangyi Huang
Xiaoyu Shi
Lewei Lu
Wenxiu Sun
Xiaogang Wang
Jifeng Dai
Hongsheng Li
ViT
27
124
0
07 Sep 2021
Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose
  Estimation
Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation
Ziniu Wan
Zhengjia Li
Maoqing Tian
Jianbo Liu
Shuai Yi
Hongsheng Li
3DH
35
80
0
06 Sep 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
46
105
0
30 Aug 2021
Do Vision Transformers See Like Convolutional Neural Networks?
Do Vision Transformers See Like Convolutional Neural Networks?
M. Raghu
Thomas Unterthiner
Simon Kornblith
Chiyuan Zhang
Alexey Dosovitskiy
ViT
67
924
0
19 Aug 2021
Causal Attention for Unbiased Visual Recognition
Causal Attention for Unbiased Visual Recognition
Tan Wang
Chan Zhou
Qianru Sun
Hanwang Zhang
OOD
CML
32
108
0
19 Aug 2021
Mobile-Former: Bridging MobileNet and Transformer
Mobile-Former: Bridging MobileNet and Transformer
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
183
476
0
12 Aug 2021
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial
  Locality?
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Yuki Tatsunami
Masato Taki
27
12
0
09 Aug 2021
TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer
  Embedding Network
TriTransNet: RGB-D Salient Object Detection with a Triplet Transformer Embedding Network
Zhengyi Liu
Yuan Wang
Zhengzheng Tu
Yun Xiao
Bin Tang
ViT
32
142
0
09 Aug 2021
Armour: Generalizable Compact Self-Attention for Vision Transformers
Armour: Generalizable Compact Self-Attention for Vision Transformers
Lingchuan Meng
ViT
21
3
0
03 Aug 2021
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Yifan Xu
Zhijie Zhang
Mengdan Zhang
Kekai Sheng
Ke Li
Weiming Dong
Liqing Zhang
Changsheng Xu
Xing Sun
ViT
32
201
0
03 Aug 2021
Congested Crowd Instance Localization with Dilated Convolutional Swin
  Transformer
Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer
Junyuan Gao
Maoguo Gong
Xuelong Li
ViT
19
46
0
02 Aug 2021
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale
  Attention
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
Wenxiao Wang
Lulian Yao
Long Chen
Binbin Lin
Deng Cai
Xiaofei He
Wei Liu
32
257
0
31 Jul 2021
DPT: Deformable Patch-based Transformer for Visual Recognition
DPT: Deformable Patch-based Transformer for Visual Recognition
Zhiyang Chen
Yousong Zhu
Chaoyang Zhao
Guosheng Hu
Wei Zeng
Jinqiao Wang
Ming Tang
ViT
16
98
0
30 Jul 2021
Query2Label: A Simple Transformer Way to Multi-Label Classification
Query2Label: A Simple Transformer Way to Multi-Label Classification
Shilong Liu
Lei Zhang
Xiao Yang
Hang Su
Jun Zhu
24
187
0
22 Jul 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
33
231
0
21 Jul 2021
All the attention you need: Global-local, spatial-channel attention for
  image retrieval
All the attention you need: Global-local, spatial-channel attention for image retrieval
Chull Hwan Song
Hye Joo Han
Yannis Avrithis
16
39
0
16 Jul 2021
A Comparative Study of Deep Learning Classification Methods on a Small
  Environmental Microorganism Image Dataset (EMDS-6): from Convolutional Neural
  Networks to Visual Transformers
A Comparative Study of Deep Learning Classification Methods on a Small Environmental Microorganism Image Dataset (EMDS-6): from Convolutional Neural Networks to Visual Transformers
Penghui Zhao
Chen Li
M. Rahaman
Hao Xu
Hechen Yang
Hongzan Sun
Tao Jiang
M. Grzegorzek
VLM
30
39
0
16 Jul 2021
Visual Parser: Representing Part-whole Hierarchies with Transformers
Visual Parser: Representing Part-whole Hierarchies with Transformers
Shuyang Sun
Xiaoyu Yue
S. Bai
Philip Torr
50
27
0
13 Jul 2021
Learning Vision Transformer with Squeeze and Excitation for Facial
  Expression Recognition
Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition
Mouath Aouayeb
W. Hamidouche
Catherine Soladié
K. Kpalma
Renaud Séguier
ViT
28
57
0
07 Jul 2021
Learning Efficient Vision Transformers via Fine-Grained Manifold
  Distillation
Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation
Zhiwei Hao
Jianyuan Guo
Ding Jia
Kai Han
Yehui Tang
Chao Zhang
Dacheng Tao
Yunhe Wang
ViT
33
68
0
03 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
36
259
0
01 Jul 2021
Global Filter Networks for Image Classification
Global Filter Networks for Image Classification
Yongming Rao
Wenliang Zhao
Zheng Zhu
Jiwen Lu
Jie Zhou
ViT
28
450
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision
  Transformers
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
42
428
0
01 Jul 2021
Augmented Shortcuts for Vision Transformers
Augmented Shortcuts for Vision Transformers
Yehui Tang
Kai Han
Chang Xu
An Xiao
Yiping Deng
Chao Xu
Yunhe Wang
ViT
14
39
0
30 Jun 2021
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
45
26
0
28 Jun 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
50
1,607
0
25 Jun 2021
VOLO: Vision Outlooker for Visual Recognition
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
52
314
0
24 Jun 2021
IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision
  Transformers
IA-RED2^22: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Yikang Shen
Yi Ding
Zhangyang Wang
Rogerio Feris
A. Oliva
VLM
ViT
39
153
0
23 Jun 2021
Vision Permutator: A Permutable MLP-Like Architecture for Visual
  Recognition
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition
Qibin Hou
Zihang Jiang
Li-xin Yuan
Mingg-Ming Cheng
Shuicheng Yan
Jiashi Feng
ViT
MLLM
24
205
0
23 Jun 2021
P2T: Pyramid Pooling Transformer for Scene Understanding
P2T: Pyramid Pooling Transformer for Scene Understanding
Yu-Huan Wu
Yun-Hai Liu
Xin Zhan
Mingg-Ming Cheng
ViT
29
219
0
22 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
37
127
0
21 Jun 2021
XCiT: Cross-Covariance Image Transformers
XCiT: Cross-Covariance Image Transformers
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
ViT
42
499
0
17 Jun 2021
Space-time Mixing Attention for Video Transformer
Space-time Mixing Attention for Video Transformer
Adrian Bulat
Juan-Manuel Perez-Rua
Swathikiran Sudhakaran
Brais Martínez
Georgios Tzimiropoulos
ViT
30
124
0
10 Jun 2021
Scaling Vision with Sparse Mixture of Experts
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
17
575
0
10 Jun 2021
CAT: Cross Attention in Vision Transformer
CAT: Cross Attention in Vision Transformer
Hezheng Lin
Xingyi Cheng
Xiangyu Wu
Fan Yang
Dong Shen
Zhongyuan Wang
Qing Song
Wei Yuan
ViT
32
149
0
10 Jun 2021
Previous
12345678
Next