Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.10270
Cited By
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
18 June 2021
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers"
50 / 415 papers shown
Title
Better plain ViT baselines for ImageNet-1k
Lucas Beyer
Xiaohua Zhai
Alexander Kolesnikov
ViT
VLM
33
111
0
03 May 2022
Revealing Occlusions with 4D Neural Fields
Basile Van Hoorick
Purva Tendulkar
Dídac Surís
Dennis Park
Simon Stent
Carl Vondrick
27
16
0
22 Apr 2022
Safe Self-Refinement for Transformer-based Domain Adaptation
Tao Sun
Cheng Lu
Tianshuo Zhang
Haibin Ling
ViT
21
80
0
16 Apr 2022
DeiT III: Revenge of the ViT
Hugo Touvron
Matthieu Cord
Hervé Jégou
ViT
42
389
0
14 Apr 2022
Improving Vision Transformers by Revisiting High-frequency Components
Jiawang Bai
Liuliang Yuan
Shutao Xia
Shuicheng Yan
Zhifeng Li
Wei Liu
ViT
16
90
0
03 Apr 2022
Task Adaptive Parameter Sharing for Multi-Task Learning
Matthew Wallingford
Hao Li
Alessandro Achille
Avinash Ravichandran
Charless C. Fowlkes
Rahul Bhotika
Stefano Soatto
MoMe
25
59
0
30 Mar 2022
Surface Vision Transformers: Attention-Based Modelling applied to Cortical Analysis
Simon Dahan
Abdulah Fawaz
Logan Z. J. Williams
Chunhui Yang
Timothy S. Coalson
M. Glasser
A. Edwards
Daniel Rueckert
E. C. Robinson
MedIm
ViT
40
20
0
30 Mar 2022
Semantic Segmentation by Early Region Proxy
Yifan Zhang
Bo Pang
Cewu Lu
ViT
52
29
0
26 Mar 2022
StructToken : Rethinking Semantic Segmentation with Structural Prior
Fangjian Lin
Zhanhao Liang
Miao Zheng
Junjun He
Kaibing Chen
Sheng Tian
23
48
0
23 Mar 2022
Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework
Botao Ye
Hong Chang
Bingpeng Ma
Shiguang Shan
Xilin Chen
ViT
27
275
0
22 Mar 2022
Masked Discrimination for Self-Supervised Learning on Point Clouds
Haotian Liu
Mu Cai
Yong Jae Lee
3DPC
21
164
0
21 Mar 2022
Hyperbolic Vision Transformers: Combining Improvements in Metric Learning
Aleksandr Ermolov
L. Mirvakhabova
Valentin Khrulkov
N. Sebe
Ivan V. Oseledets
25
100
0
21 Mar 2022
Disentangling Architecture and Training for Optical Flow
Deqing Sun
Charles Herrmann
F. Reda
Michael Rubinstein
David Fleet
William T. Freeman
3DPC
OOD
66
34
0
21 Mar 2022
Three things everyone should know about Vision Transformers
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Jakob Verbeek
Hervé Jégou
ViT
24
119
0
18 Mar 2022
Are Vision Transformers Robust to Spurious Correlations?
Soumya Suvra Ghosal
Yifei Ming
Yixuan Li
ViT
27
28
0
17 Mar 2022
SC2 Benchmark: Supervised Compression for Split Computing
Yoshitomo Matsubara
Ruihan Yang
Marco Levorato
Stephan Mandt
14
18
0
16 Mar 2022
2-speed network ensemble for efficient classification of incremental land-use/land-cover satellite image chips
M. J. Horry
Subrata Chakraborty
B. Pradhan
N. Shukla
Sanjoy Paul
28
1
0
15 Mar 2022
LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference
Zhongzhi Yu
Y. Fu
Shang Wu
Mengquan Li
Haoran You
Yingyan Lin
28
1
0
15 Mar 2022
Retrieval Augmented Classification for Long-Tail Visual Recognition
Alex Long
Wei Yin
Thalaiyasingam Ajanthan
Vu-Linh Nguyen
Pulak Purkait
Ravi Garg
Alan Blair
Chunhua Shen
Anton Van Den Hengel
21
107
0
22 Feb 2022
Meta Knowledge Distillation
Jihao Liu
Boxiao Liu
Hongsheng Li
Yu Liu
18
25
0
16 Feb 2022
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
44
465
0
14 Feb 2022
VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer
Mengshu Sun
Haoyu Ma
Guoliang Kang
Yi Ding
Tianlong Chen
Xiaolong Ma
Zhangyang Wang
Yanzhi Wang
ViT
33
45
0
17 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
26
212
0
12 Jan 2022
A ConvNet for the 2020s
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
42
4,980
0
10 Jan 2022
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
Zhenglun Kong
Peiyan Dong
Xiaolong Ma
Xin Meng
Mengshu Sun
...
Geng Yuan
Bin Ren
Minghai Qin
H. Tang
Yanzhi Wang
ViT
34
144
0
27 Dec 2021
MIA-Former: Efficient and Robust Vision Transformers via Multi-grained Input-Adaptation
Zhongzhi Yu
Y. Fu
Sicheng Li
Chaojian Li
Yingyan Lin
ViT
33
19
0
21 Dec 2021
Learned Queries for Efficient Local Attention
Moab Arar
Ariel Shamir
Amit H. Bermano
ViT
41
29
0
21 Dec 2021
How to augment your ViTs? Consistency loss and StyleAug, a random style transfer augmentation
Akash Umakantha
João Dias Semedo
S. Golestaneh
Wan-Yi Lin
ViT
28
0
0
16 Dec 2021
PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures
Dan Hendrycks
Andy Zou
Mantas Mazeika
Leonard Tang
Bo-wen Li
D. Song
Jacob Steinhardt
UQCV
23
137
0
09 Dec 2021
Self-supervised Video Transformer
Kanchana Ranasinghe
Muzammal Naseer
Salman Khan
F. Khan
Michael S. Ryoo
ViT
39
84
0
02 Dec 2021
Pyramid Adversarial Training Improves ViT Performance
Charles Herrmann
Kyle Sargent
Lu Jiang
Ramin Zabih
Huiwen Chang
Ce Liu
Dilip Krishnan
Deqing Sun
ViT
29
56
0
30 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
25
6
0
26 Nov 2021
Discrete Representations Strengthen Vision Transformer Robustness
Chengzhi Mao
Lu Jiang
Mostafa Dehghani
Carl Vondrick
Rahul Sukthankar
Irfan Essa
ViT
27
43
0
20 Nov 2021
Rethinking Query, Key, and Value Embedding in Vision Transformer under Tiny Model Constraints
Jaesin Ahn
Jiuk Hong
Jeongwoo Ju
Heechul Jung
ViT
32
3
0
19 Nov 2021
Benchmarking and scaling of deep learning models for land cover image classification
Ioannis Papoutsis
N. Bountos
Angelos Zavras
Dimitrios Michail
Christos Tryfonopoulos
24
55
0
18 Nov 2021
TorchGeo: Deep Learning With Geospatial Data
Adam J. Stewart
Caleb Robinson
Isaac Corley
Anthony Ortiz
J. L. Ferres
Arindam Banerjee
3DPC
32
76
0
17 Nov 2021
INTERN: A New Learning Paradigm Towards General Vision
Jing Shao
Siyu Chen
Yangguang Li
Kun Wang
Zhen-fei Yin
...
F. Yu
Junjie Yan
Dahua Lin
Xiaogang Wang
Yu Qiao
18
34
0
16 Nov 2021
Improved Robustness of Vision Transformer via PreLayerNorm in Patch Embedding
Bum Jun Kim
Hyeyeon Choi
Hyeonah Jang
Dong Gu Lee
Wonseok Jeong
Sang Woo Kim
ViT
13
8
0
16 Nov 2021
LiT: Zero-Shot Transfer with Locked-image text Tuning
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
48
543
0
15 Nov 2021
Identifying the key components in ResNet-50 for diabetic retinopathy grading from fundus images: a systematic investigation
Yijin Huang
Li Lin
Pujin Cheng
Junyan Lyu
Roger Tam
Xiaoying Tang
29
32
0
27 Oct 2021
The Efficiency Misnomer
Daoyuan Chen
Liuyi Yao
Dawei Gao
Ashish Vaswani
Yaliang Li
34
99
0
25 Oct 2021
SCENIC: A JAX Library for Computer Vision Research and Beyond
Mostafa Dehghani
A. Gritsenko
Anurag Arnab
Matthias Minderer
Yi Tay
46
68
0
18 Oct 2021
Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation
Yao Qin
Chiyuan Zhang
Ting Chen
Balaji Lakshminarayanan
Alex Beutel
Xuezhi Wang
ViT
50
42
0
15 Oct 2021
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
Jihao Liu
Hongsheng Li
Guanglu Song
Xin Huang
Yu Liu
ViT
37
35
0
08 Oct 2021
Sparse MoEs meet Efficient Ensembles
J. Allingham
F. Wenzel
Zelda E. Mariet
Basil Mustafa
J. Puigcerver
...
Balaji Lakshminarayanan
Jasper Snoek
Dustin Tran
Carlos Riquelme Ruiz
Rodolphe Jenatton
MoE
46
21
0
07 Oct 2021
Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs
Philipp Benz
Soomin Ham
Chaoning Zhang
Adil Karjauv
In So Kweon
AAML
ViT
47
78
0
06 Oct 2021
Fine-tuning Vision Transformers for the Prediction of State Variables in Ising Models
Onur Kara
Arijit Sehanobish
H. Corzo
20
4
0
28 Sep 2021
AdaLoss: A computationally-efficient and provably convergent adaptive gradient method
Xiaoxia Wu
Yuege Xie
S. Du
Rachel A. Ward
ODL
19
7
0
17 Sep 2021
Passive Attention in Artificial Neural Networks Predicts Human Visual Selectivity
Thomas A. Langlois
H. C. Zhao
Erin Grant
Ishita Dasgupta
Thomas L. Griffiths
Nori Jacoby
47
15
0
14 Jul 2021
Previous
1
2
3
4
5
6
7
8
9
Next