Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.11986
Cited By
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
28 January 2021
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet"
50 / 396 papers shown
Title
Improving the Transferability of Adversarial Examples with Restructure Embedded Patches
Huipeng Zhou
Yu-an Tan
Yajie Wang
Haoran Lyu
Shan-Hung Wu
Yuan-zhang Li
ViT
19
4
0
27 Apr 2022
Boosting Adversarial Transferability of MLP-Mixer
Haoran Lyu
Yajie Wang
Yu-an Tan
Huipeng Zhou
Yuhang Zhao
Quan-xin Zhang
AAML
27
1
0
26 Apr 2022
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu-Gang Jiang
ViT
AAML
21
4
0
26 Apr 2022
Residual Mixture of Experts
Lemeng Wu
Mengchen Liu
Yinpeng Chen
Dongdong Chen
Xiyang Dai
Lu Yuan
MoE
22
36
0
20 Apr 2022
VSA: Learning Varied-Size Window Attention in Vision Transformers
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
22
53
0
18 Apr 2022
Application of Transfer Learning and Ensemble Learning in Image-level Classification for Breast Histopathology
Yuchao Zheng
Chen Li
Xiaomin Zhou
Hao Chen
Hao Xu
...
Haiqing Zhang
Xirong Li
Hongzan Sun
Xinyu Huang
M. Grzegorzek
33
55
0
18 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
28
123
0
14 Apr 2022
Neighborhood Attention Transformer
Ali Hassani
Steven Walton
Jiacheng Li
Shengjia Li
Humphrey Shi
ViT
AI4TS
36
253
0
14 Apr 2022
TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation
Wenqiang Zhang
Zilong Huang
Guozhong Luo
Tao Chen
Xinggang Wang
Wenyu Liu
Gang Yu
Chunhua Shen
ViT
22
199
0
12 Apr 2022
DaViT: Dual Attention Vision Transformers
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
51
240
0
07 Apr 2022
Learning Local and Global Temporal Contexts for Video Semantic Segmentation
Guolei Sun
Yun Liu
Henghui Ding
Min Wu
Luc Van Gool
30
32
0
07 Apr 2022
An Empirical Study of Remote Sensing Pretraining
Di Wang
Jing Zhang
Bo Du
Guisong Xia
Dacheng Tao
EDL
36
190
0
06 Apr 2022
Towards An End-to-End Framework for Flow-Guided Video Inpainting
Zerui Li
Cheng Lu
Jia Qin
Chunle Guo
Mingg-Ming Cheng
41
149
0
06 Apr 2022
MixFormer: Mixing Features across Windows and Dimensions
Qiang Chen
Qiman Wu
Jian Wang
Qinghao Hu
T. Hu
Errui Ding
Jian Cheng
Jingdong Wang
MDE
ViT
31
101
0
06 Apr 2022
Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation
Wangbo Zhao
Kai Wang
Xiangxiang Chu
Fuzhao Xue
Xinchao Wang
Yang You
29
21
0
06 Apr 2022
BatchFormerV2: Exploring Sample Relationships for Dense Representation Learning
Zhi Hou
Baosheng Yu
Chaoyue Wang
Yibing Zhan
Dacheng Tao
ViT
29
11
0
04 Apr 2022
Improving Vision Transformers by Revisiting High-frequency Components
Jiawang Bai
Liuliang Yuan
Shutao Xia
Shuicheng Yan
Zhifeng Li
Wei Liu
ViT
16
90
0
03 Apr 2022
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Tri Dao
Beidi Chen
N. Sohoni
Arjun D Desai
Michael Poli
Jessica Grogan
Alexander Liu
Aniruddh Rao
Atri Rudra
Christopher Ré
22
87
0
01 Apr 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
11
61
0
29 Mar 2022
SepViT: Separable Vision Transformer
Wei Li
Xing Wang
Xin Xia
Jie Wu
Jiashi Li
Xuefeng Xiao
Min Zheng
Shiping Wen
ViT
26
40
0
29 Mar 2022
ObjectFormer for Image Manipulation Detection and Localization
Junke Wang
Zuxuan Wu
Jingjing Chen
Xintong Han
Abhinav Shrivastava
Ser-Nam Lim
Yu-Gang Jiang
37
108
0
28 Mar 2022
Spatially Multi-conditional Image Generation
Ritika Chakraborty
Nikola Popovic
D. Paudel
Thomas Probst
Luc Van Gool
24
1
0
25 Mar 2022
Self-supervised Video-centralised Transformer for Video Face Clustering
Yujiang Wang
Mingzhi Dong
Jie Shen
Yi-Si Luo
Yiming Lin
Pingchuan Ma
Stavros Petridis
M. Pantic
ViT
26
3
0
24 Mar 2022
Transformers Meet Visual Learning Understanding: A Comprehensive Review
Yuting Yang
Licheng Jiao
Xuantong Liu
F. Liu
Shuyuan Yang
Zhixi Feng
Xu Tang
ViT
MedIm
27
28
0
24 Mar 2022
Beyond Fixation: Dynamic Window Visual Transformer
Pengzhen Ren
Changlin Li
Guangrun Wang
Yun Xiao
Qing Du
Xiaodan Liang
Qing Du Xiaodan Liang Xiaojun Chang
ViT
28
32
0
24 Mar 2022
Unsupervised Salient Object Detection with Spectral Cluster Voting
Gyungin Shin
Samuel Albanie
Weidi Xie
24
65
0
23 Mar 2022
Training-free Transformer Architecture Search
Qinqin Zhou
Kekai Sheng
Xiawu Zheng
Ke Li
Xing Sun
Yonghong Tian
Jie Chen
Rongrong Ji
ViT
34
46
0
23 Mar 2022
Meta-attention for ViT-backed Continual Learning
Mengqi Xue
Haofei Zhang
Mingli Song
Mingli Song
CLL
32
42
0
22 Mar 2022
ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer
Rui Yang
Hailong Ma
Jie Wu
Yansong Tang
Xuefeng Xiao
Min Zheng
Xiu Li
ViT
19
53
0
21 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
28
5
0
20 Mar 2022
HIPA: Hierarchical Patch Transformer for Single Image Super Resolution
Qing Cai
Yiming Qian
Jinxing Li
Junjie Lv
Yee-Hong Yang
Feng Wu
Dafan Zhang
25
28
0
19 Mar 2022
A Dual Weighting Label Assignment Scheme for Object Detection
Shuai Li
Chenhang He
Ruihuang Li
Lei Zhang
30
79
0
18 Mar 2022
MatchFormer: Interleaving Attention in Transformers for Feature Matching
Qing Wang
Jiaming Zhang
Kailun Yang
Kunyu Peng
Rainer Stiefelhagen
ViT
44
141
0
17 Mar 2022
Towards Data-Efficient Detection Transformers
Wen Wang
Jing Zhang
Yang Cao
Yongliang Shen
Dacheng Tao
ViT
23
59
0
17 Mar 2022
Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning
Yang He
Weihan Liang
Dongyang Zhao
Hong-Yu Zhou
Weifeng Ge
Yizhou Yu
Wenqiang Zhang
ViT
32
45
0
17 Mar 2022
Smoothing Matters: Momentum Transformer for Domain Adaptive Semantic Segmentation
Runfa Chen
Yu Rong
Shangmin Guo
Jiaqi Han
Gang Hua
Tingyang Xu
Wenbing Huang
ViT
15
20
0
15 Mar 2022
Deep Transformers Thirst for Comprehensive-Frequency Data
R. Xia
Chao Xue
Boyu Deng
Fang Wang
Jingchao Wang
ViT
25
0
0
14 Mar 2022
Masked Autoencoders for Point Cloud Self-supervised Learning
Yatian Pang
Wenxiao Wang
Francis E. H. Tay
Wei Liu
Yonghong Tian
Liuliang Yuan
3DPC
ViT
33
454
0
13 Mar 2022
DFTR: Depth-supervised Fusion Transformer for Salient Object Detection
Heqin Zhu
Xu Sun
Yuexiang Li
Kai Ma
S. Kevin Zhou
Yefeng Zheng
ViT
44
9
0
12 Mar 2022
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
Tianlong Chen
Zhenyu (Allen) Zhang
Yu Cheng
Ahmed Hassan Awadallah
Zhangyang Wang
ViT
41
37
0
12 Mar 2022
Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice
Peihao Wang
Wenqing Zheng
Tianlong Chen
Zhangyang Wang
ViT
27
127
0
09 Mar 2022
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
Youwei Pang
Xiaoqi Zhao
Tian-Zhu Xiang
Zhang Lihe
Huchuan Lu
ObjD
26
213
0
05 Mar 2022
LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network
Zhigang Jiang
Zhongzheng Xiang
Jinhua Xu
Mingbi Zhao
ViT
3DV
27
34
0
03 Mar 2022
Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work
Khawar Islam
ViT
28
45
0
03 Mar 2022
3DCTN: 3D Convolution-Transformer Network for Point Cloud Classification
Dening Lu
Qian Xie
Linlin Xu
Jonathan Li
3DV
19
68
0
02 Mar 2022
A Data-scalable Transformer for Medical Image Segmentation: Architecture, Model Efficiency, and Benchmark
Yunhe Gao
Mu Zhou
Ding Liu
Zhennan Yan
Shaoting Zhang
Dimitris N. Metaxas
ViT
MedIm
26
68
0
28 Feb 2022
SUNet: Swin Transformer UNet for Image Denoising
Chi-Mao Fan
Tsung-Jung Liu
Kuan-Hsien Liu
ViT
39
112
0
28 Feb 2022
CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising
Dayang Wang
Fenglei Fan
Zhan Wu
R. Liu
Fei-Yue Wang
Hengyong Yu
ViT
MedIm
35
121
0
28 Feb 2022
Factorizer: A Scalable Interpretable Approach to Context Modeling for Medical Image Segmentation
Pooya Ashtari
Diana Sima
L. De Lathauwer
D. Sappey-Marinier
F. Maes
Sabine Van Huffel
ViT
MedIm
26
35
0
24 Feb 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViT
VLM
192
499
0
22 Feb 2022
Previous
1
2
3
4
5
6
7
8
Next