ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.11986
  4. Cited By
Tokens-to-Token ViT: Training Vision Transformers from Scratch on
  ImageNet

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

28 January 2021
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
    ViT
ArXivPDFHTML

Papers citing "Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet"

50 / 396 papers shown
Title
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and
  Effective Fusion of Local, Global and Input Features
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
103
87
0
30 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question
  Answering
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
39
21
0
21 Sep 2022
Relational Reasoning via Set Transformers: Provable Efficiency and
  Applications to MARL
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL
Fengzhuo Zhang
Boyi Liu
Kaixin Wang
Vincent Y. F. Tan
Zhuoran Yang
Zhaoran Wang
OffRL
LRM
51
10
0
20 Sep 2022
An Efficient End-to-End Transformer with Progressive Tri-modal Attention
  for Multi-modal Emotion Recognition
An Efficient End-to-End Transformer with Progressive Tri-modal Attention for Multi-modal Emotion Recognition
Yang Wu
Pai Peng
Zhenyu Zhang
Yanyan Zhao
Bing Qin
27
1
0
20 Sep 2022
Relaxed Attention for Transformer Models
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
PPT: token-Pruned Pose Transformer for monocular and multi-view human
  pose estimation
PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation
Haoyu Ma
Zhe Wang
Yifei Chen
Deying Kong
Liangjian Chen
Xingwei Liu
Xiangyi Yan
Hao Tang
Xiaohui Xie
ViT
35
47
0
16 Sep 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
82
31
0
14 Sep 2022
MRL: Learning to Mix with Attention and Convolutions
MRL: Learning to Mix with Attention and Convolutions
Shlok Mohta
Hisahiro Suganuma
Yoshiki Tanaka
28
2
0
30 Aug 2022
Exploring Adversarial Robustness of Vision Transformers in the Spectral
  Perspective
Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspective
Gihyun Kim
Juyeop Kim
Jong-Seok Lee
AAML
ViT
24
4
0
20 Aug 2022
Improved Image Classification with Token Fusion
Improved Image Classification with Token Fusion
Keong-Hun Choi
Jin-Woo Kim
Yaolong Wang
J. Ha
ViT
19
0
0
19 Aug 2022
DropKey
DropKey
Bonan li
Yinhan Hu
Xuecheng Nie
Congying Han
Xiangjian Jiang
Tiande Guo
Luoqi Liu
15
11
0
04 Aug 2022
Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer
Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer
Yingyi Chen
Xiaoke Shen
Yahui Liu
Qinghua Tao
Johan A. K. Suykens
AAML
ViT
28
22
0
25 Jul 2022
An Impartial Take to the CNN vs Transformer Robustness Contest
An Impartial Take to the CNN vs Transformer Robustness Contest
Francesco Pinto
Philip Torr
P. Dokania
UQCV
AAML
30
48
0
22 Jul 2022
Geodesic-Former: a Geodesic-Guided Few-shot 3D Point Cloud Instance
  Segmenter
Geodesic-Former: a Geodesic-Guided Few-shot 3D Point Cloud Instance Segmenter
T. Ngo
Khoi Duc Minh Nguyen
3DPC
19
4
0
22 Jul 2022
Locality Guidance for Improving Vision Transformers on Tiny Datasets
Locality Guidance for Improving Vision Transformers on Tiny Datasets
Kehan Li
Runyi Yu
Zhennan Wang
Li-ming Yuan
Guoli Song
Jie Chen
ViT
32
43
0
20 Jul 2022
EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
Chenyu Yang
W. He
Yingqing Xu
Yang Gao
DiffM
19
26
0
20 Jul 2022
HiFormer: Hierarchical Multi-scale Representations Using Transformers
  for Medical Image Segmentation
HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation
Moein Heidari
A. Kazerouni
Milad Soltany Kadarvish
Reza Azad
Ehsan Khodapanah Aghdam
Julien Cohen-Adad
Dorit Merhof
MedIm
ViT
25
178
0
18 Jul 2022
ESFPNet: efficient deep learning architecture for real-time lesion
  segmentation in autofluorescence bronchoscopic video
ESFPNet: efficient deep learning architecture for real-time lesion segmentation in autofluorescence bronchoscopic video
Qi Chang
Danish Ahmad
J.W. Toth
R. Bascom
W. Higgins
MedIm
21
49
0
15 Jul 2022
Weakly Supervised Video Salient Object Detection via Point Supervision
Weakly Supervised Video Salient Object Detection via Point Supervision
Shuyong Gao
Hao Xing
Wei Zhang
Yan Wang
Qianyu Guo
Wenqiang Zhang
33
24
0
15 Jul 2022
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in
  Realistic Industrial Scenarios
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
Jiashi Li
Xin Xia
W. Li
Huixia Li
Xing Wang
Xuefeng Xiao
Rui Wang
Min Zheng
Xin Pan
ViT
17
149
0
12 Jul 2022
Dual Vision Transformer
Dual Vision Transformer
Ting Yao
Yehao Li
Yingwei Pan
Yu Wang
Xiaoping Zhang
Tao Mei
ViT
141
75
0
11 Jul 2022
SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection
  with Depth Image Classification
SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification
Xin Jia
Changlei Dongye
Yan-Tsung Peng
ViT
29
18
0
09 Jul 2022
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for
  Mobile Vision Applications
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications
Muhammad Maaz
Abdelrahman M. Shaker
Hisham Cholakkal
Salman Khan
Syed Waqas Zamir
Rao Muhammad Anwer
Fahad Shahbaz Khan
ViT
29
184
0
21 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
34
32
0
19 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
SimA: Simple Softmax-free Attention for Vision Transformers
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
21
25
0
17 Jun 2022
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering
  in Indoor Scenes
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes
Rui Zhu
Zhengqin Li
J. Matai
Fatih Porikli
Manmohan Chandraker
ViT
43
45
0
16 Jun 2022
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Yuxuan Zhou
Wangmeng Xiang
Chuan Li
Biao Wang
Xihan Wei
Lei Zhang
M. Keuper
Xia Hua
ViT
34
15
0
15 Jun 2022
Transformers are Meta-Reinforcement Learners
Transformers are Meta-Reinforcement Learners
Luckeciano C. Melo
OffRL
41
50
0
14 Jun 2022
IL-MCAM: An interactive learning and multi-channel attention
  mechanism-based weakly supervised colorectal histopathology image
  classification approach
IL-MCAM: An interactive learning and multi-channel attention mechanism-based weakly supervised colorectal histopathology image classification approach
Hao Chen
Chen Li
Xirong Li
M. Rahaman
Weiming Hu
...
Wanli Liu
Changhao Sun
Hongzan Sun
Xinyu Huang
M. Grzegorzek
HAI
32
99
0
07 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViT
MQ
26
251
0
06 Jun 2022
EfficientFormer: Vision Transformers at MobileNet Speed
EfficientFormer: Vision Transformers at MobileNet Speed
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
23
347
0
02 Jun 2022
CVM-Cervix: A Hybrid Cervical Pap-Smear Image Classification Framework
  Using CNN, Visual Transformer and Multilayer Perceptron
CVM-Cervix: A Hybrid Cervical Pap-Smear Image Classification Framework Using CNN, Visual Transformer and Multilayer Perceptron
Wanli Liu
Chen Li
N. Xu
Tao Jiang
M. Rahaman
...
Weiming Hu
Hao Chen
Changhao Sun
Yudong Yao
M. Grzegorzek
9
132
0
02 Jun 2022
MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet
MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet
Nan Wang
Shaohui Lin
Xiaoxiao Li
Ke Li
Yunhang Shen
Yue Gao
Lizhuang Ma
ViT
MedIm
46
33
0
02 Jun 2022
XBound-Former: Toward Cross-scale Boundary Modeling in Transformers
XBound-Former: Toward Cross-scale Boundary Modeling in Transformers
Jiacheng Wang
Fei Chen
Yuxi Ma
Liansheng Wang
Zhaodong Fei
Jia Shuai
Xiangdong Tang
Qichao Zhou
Jing Qin
ViT
MedIm
27
63
0
02 Jun 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Qi Dai
Lingxi Xie
Qi Tian
64
26
0
30 May 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis
WaveMix: A Resource-efficient Neural Network for Image Analysis
Pranav Jeevan
Kavitha Viswanathan
S. AnanduA
A. Sethi
20
20
0
28 May 2022
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Yangyang Xu
Xiangtai Li
Haobo Yuan
Yibo Yang
Lefei Zhang
ViT
28
45
0
28 May 2022
Object-wise Masked Autoencoders for Fast Pre-training
Object-wise Masked Autoencoders for Fast Pre-training
Jiantao Wu
Shentong Mo
ViT
OCL
25
15
0
28 May 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
78
2,024
0
27 May 2022
Inception Transformer
Inception Transformer
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
28
187
0
25 May 2022
Super Vision Transformer
Super Vision Transformer
Mingbao Lin
Yonghong Tian
Yuxin Zhang
Yunhang Shen
Rongrong Ji
Liujuan Cao
ViT
46
20
0
23 May 2022
SelfReformer: Self-Refined Network with Transformer for Salient Object
  Detection
SelfReformer: Self-Refined Network with Transformer for Salient Object Detection
Y. Yun
Weisi Lin
ViT
60
28
0
23 May 2022
Boosting Camouflaged Object Detection with Dual-Task Interactive
  Transformer
Boosting Camouflaged Object Detection with Dual-Task Interactive Transformer
Zheng Liu
Zhili Zhang
Wei Wu
32
46
0
21 May 2022
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision
  Transformers with Locality
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
113
73
0
20 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
  Semantic Segmentation and Localization
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
30
159
0
16 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
Automatic segmentation of meniscus based on MAE self-supervision and
  point-line weak supervision paradigm
Automatic segmentation of meniscus based on MAE self-supervision and point-line weak supervision paradigm
Yuhan Xie
Kexin Jiang
Zhiyong Zhang
Shaolong Chen
Xiaodong Zhang
Changzhen Qiu
24
1
0
07 May 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision
  Transformers
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
L. Dudziak
Hongsheng Li
Georgios Tzimiropoulos
Brais Martínez
ViT
31
181
0
06 May 2022
Sequencer: Deep LSTM for Image Classification
Sequencer: Deep LSTM for Image Classification
Yuki Tatsunami
Masato Taki
VLM
ViT
16
78
0
04 May 2022
DearKD: Data-Efficient Early Knowledge Distillation for Vision
  Transformers
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Xianing Chen
Qiong Cao
Yujie Zhong
Jing Zhang
Shenghua Gao
Dacheng Tao
ViT
40
76
0
27 Apr 2022
Previous
12345678
Next