Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.11986
Cited By
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
28 January 2021
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet"
50 / 396 papers shown
Title
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
S. Wadekar
Abhishek Chaurasia
ViT
103
87
0
30 Sep 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
39
21
0
21 Sep 2022
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL
Fengzhuo Zhang
Boyi Liu
Kaixin Wang
Vincent Y. F. Tan
Zhuoran Yang
Zhaoran Wang
OffRL
LRM
51
10
0
20 Sep 2022
An Efficient End-to-End Transformer with Progressive Tri-modal Attention for Multi-modal Emotion Recognition
Yang Wu
Pai Peng
Zhenyu Zhang
Yanyan Zhao
Bing Qin
27
1
0
20 Sep 2022
Relaxed Attention for Transformer Models
Timo Lohrenz
Björn Möller
Zhengyang Li
Tim Fingscheidt
KELM
29
11
0
20 Sep 2022
PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation
Haoyu Ma
Zhe Wang
Yifei Chen
Deying Kong
Liangjian Chen
Xingwei Liu
Xiangyi Yan
Hao Tang
Xiaohui Xie
ViT
35
47
0
16 Sep 2022
Efficient Quantized Sparse Matrix Operations on Tensor Cores
Shigang Li
Kazuki Osawa
Torsten Hoefler
82
31
0
14 Sep 2022
MRL: Learning to Mix with Attention and Convolutions
Shlok Mohta
Hisahiro Suganuma
Yoshiki Tanaka
28
2
0
30 Aug 2022
Exploring Adversarial Robustness of Vision Transformers in the Spectral Perspective
Gihyun Kim
Juyeop Kim
Jong-Seok Lee
AAML
ViT
24
4
0
20 Aug 2022
Improved Image Classification with Token Fusion
Keong-Hun Choi
Jin-Woo Kim
Yaolong Wang
J. Ha
ViT
19
0
0
19 Aug 2022
DropKey
Bonan li
Yinhan Hu
Xuecheng Nie
Congying Han
Xiangjian Jiang
Tiande Guo
Luoqi Liu
15
11
0
04 Aug 2022
Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer
Yingyi Chen
Xiaoke Shen
Yahui Liu
Qinghua Tao
Johan A. K. Suykens
AAML
ViT
28
22
0
25 Jul 2022
An Impartial Take to the CNN vs Transformer Robustness Contest
Francesco Pinto
Philip Torr
P. Dokania
UQCV
AAML
30
48
0
22 Jul 2022
Geodesic-Former: a Geodesic-Guided Few-shot 3D Point Cloud Instance Segmenter
T. Ngo
Khoi Duc Minh Nguyen
3DPC
19
4
0
22 Jul 2022
Locality Guidance for Improving Vision Transformers on Tiny Datasets
Kehan Li
Runyi Yu
Zhennan Wang
Li-ming Yuan
Guoli Song
Jie Chen
ViT
32
43
0
20 Jul 2022
EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
Chenyu Yang
W. He
Yingqing Xu
Yang Gao
DiffM
19
26
0
20 Jul 2022
HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation
Moein Heidari
A. Kazerouni
Milad Soltany Kadarvish
Reza Azad
Ehsan Khodapanah Aghdam
Julien Cohen-Adad
Dorit Merhof
MedIm
ViT
25
178
0
18 Jul 2022
ESFPNet: efficient deep learning architecture for real-time lesion segmentation in autofluorescence bronchoscopic video
Qi Chang
Danish Ahmad
J.W. Toth
R. Bascom
W. Higgins
MedIm
21
49
0
15 Jul 2022
Weakly Supervised Video Salient Object Detection via Point Supervision
Shuyong Gao
Hao Xing
Wei Zhang
Yan Wang
Qianyu Guo
Wenqiang Zhang
33
24
0
15 Jul 2022
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
Jiashi Li
Xin Xia
W. Li
Huixia Li
Xing Wang
Xuefeng Xiao
Rui Wang
Min Zheng
Xin Pan
ViT
17
149
0
12 Jul 2022
Dual Vision Transformer
Ting Yao
Yehao Li
Yingwei Pan
Yu Wang
Xiaoping Zhang
Tao Mei
ViT
141
75
0
11 Jul 2022
SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification
Xin Jia
Changlei Dongye
Yan-Tsung Peng
ViT
29
18
0
09 Jul 2022
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications
Muhammad Maaz
Abdelrahman M. Shaker
Hisham Cholakkal
Salman Khan
Syed Waqas Zamir
Rao Muhammad Anwer
Fahad Shahbaz Khan
ViT
29
184
0
21 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
34
32
0
19 Jun 2022
SimA: Simple Softmax-free Attention for Vision Transformers
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
21
25
0
17 Jun 2022
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes
Rui Zhu
Zhengqin Li
J. Matai
Fatih Porikli
Manmohan Chandraker
ViT
43
45
0
16 Jun 2022
SP-ViT: Learning 2D Spatial Priors for Vision Transformers
Yuxuan Zhou
Wangmeng Xiang
Chuan Li
Biao Wang
Xihan Wei
Lei Zhang
M. Keuper
Xia Hua
ViT
34
15
0
15 Jun 2022
Transformers are Meta-Reinforcement Learners
Luckeciano C. Melo
OffRL
41
50
0
14 Jun 2022
IL-MCAM: An interactive learning and multi-channel attention mechanism-based weakly supervised colorectal histopathology image classification approach
Hao Chen
Chen Li
Xirong Li
M. Rahaman
Weiming Hu
...
Wanli Liu
Changhao Sun
Hongzan Sun
Xinyu Huang
M. Grzegorzek
HAI
32
99
0
07 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViT
MQ
26
251
0
06 Jun 2022
EfficientFormer: Vision Transformers at MobileNet Speed
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
23
347
0
02 Jun 2022
CVM-Cervix: A Hybrid Cervical Pap-Smear Image Classification Framework Using CNN, Visual Transformer and Multilayer Perceptron
Wanli Liu
Chen Li
N. Xu
Tao Jiang
M. Rahaman
...
Weiming Hu
Hao Chen
Changhao Sun
Yudong Yao
M. Grzegorzek
9
132
0
02 Jun 2022
MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet
Nan Wang
Shaohui Lin
Xiaoxiao Li
Ke Li
Yunhang Shen
Yue Gao
Lizhuang Ma
ViT
MedIm
46
33
0
02 Jun 2022
XBound-Former: Toward Cross-scale Boundary Modeling in Transformers
Jiacheng Wang
Fei Chen
Yuxi Ma
Liansheng Wang
Zhaodong Fei
Jia Shuai
Xiangdong Tang
Qichao Zhou
Jing Qin
ViT
MedIm
27
63
0
02 Jun 2022
HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling
Xiaosong Zhang
Yunjie Tian
Wei Huang
QiXiang Ye
Qi Dai
Lingxi Xie
Qi Tian
64
26
0
30 May 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis
Pranav Jeevan
Kavitha Viswanathan
S. AnanduA
A. Sethi
20
20
0
28 May 2022
Multi-Task Learning with Multi-Query Transformer for Dense Prediction
Yangyang Xu
Xiangtai Li
Haobo Yuan
Yibo Yang
Lefei Zhang
ViT
28
45
0
28 May 2022
Object-wise Masked Autoencoders for Fast Pre-training
Jiantao Wu
Shentong Mo
ViT
OCL
25
15
0
28 May 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
78
2,024
0
27 May 2022
Inception Transformer
Chenyang Si
Weihao Yu
Pan Zhou
Yichen Zhou
Xinchao Wang
Shuicheng Yan
ViT
28
187
0
25 May 2022
Super Vision Transformer
Mingbao Lin
Yonghong Tian
Yuxin Zhang
Yunhang Shen
Rongrong Ji
Liujuan Cao
ViT
46
20
0
23 May 2022
SelfReformer: Self-Refined Network with Transformer for Salient Object Detection
Y. Yun
Weisi Lin
ViT
60
28
0
23 May 2022
Boosting Camouflaged Object Detection with Dual-Task Interactive Transformer
Zheng Liu
Zhili Zhang
Wei Wu
32
46
0
21 May 2022
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
113
73
0
20 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
30
159
0
16 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
Automatic segmentation of meniscus based on MAE self-supervision and point-line weak supervision paradigm
Yuhan Xie
Kexin Jiang
Zhiyong Zhang
Shaolong Chen
Xiaodong Zhang
Changzhen Qiu
24
1
0
07 May 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
L. Dudziak
Hongsheng Li
Georgios Tzimiropoulos
Brais Martínez
ViT
31
181
0
06 May 2022
Sequencer: Deep LSTM for Image Classification
Yuki Tatsunami
Masato Taki
VLM
ViT
16
78
0
04 May 2022
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Xianing Chen
Qiong Cao
Yujie Zhong
Jing Zhang
Shenghua Gao
Dacheng Tao
ViT
40
76
0
27 Apr 2022
Previous
1
2
3
4
5
6
7
8
Next