ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.10270
  4. Cited By
How to train your ViT? Data, Augmentation, and Regularization in Vision
  Transformers

How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

18 June 2021
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
    ViT
ArXivPDFHTML

Papers citing "How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers"

50 / 415 papers shown
Title
A Modern Look at the Relationship between Sharpness and Generalization
A Modern Look at the Relationship between Sharpness and Generalization
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
13
54
0
14 Feb 2023
A Comprehensive Study of Modern Architectures and Regularization
  Approaches on CheXpert5000
A Comprehensive Study of Modern Architectures and Regularization Approaches on CheXpert5000
Sontje Ihler
Felix Kuhnke
Svenja Spindeldreier
22
1
0
13 Feb 2023
Spatial Functa: Scaling Functa to ImageNet Classification and Generation
Spatial Functa: Scaling Functa to ImageNet Classification and Generation
Matthias Bauer
Emilien Dupont
Andy Brock
Dan Rosenbaum
Jonathan Richard Schwarz
Hyunjik Kim
DiffM
36
35
0
06 Feb 2023
Dual PatchNorm
Dual PatchNorm
Manoj Kumar
Mostafa Dehghani
N. Houlsby
UQCV
ViT
29
11
0
02 Feb 2023
Inference Time Evidences of Adversarial Attacks for Forensic on
  Transformers
Inference Time Evidences of Adversarial Attacks for Forensic on Transformers
Hugo Lemarchant
Liang Li
Yiming Qian
Yuta Nakashima
Hajime Nagahara
ViT
AAML
43
0
0
31 Jan 2023
Benchmarking Robustness to Adversarial Image Obfuscations
Benchmarking Robustness to Adversarial Image Obfuscations
Florian Stimberg
Ayan Chakrabarti
Chun-Ta Lu
Hussein Hazimeh
Otilia Stretcu
...
Merve Kaya
Cyrus Rashtchian
Ariel Fuxman
Mehmet Tek
Sven Gowal
AAML
32
10
0
30 Jan 2023
Out of Distribution Performance of State of Art Vision Model
Out of Distribution Performance of State of Art Vision Model
Salman Rahman
W. Lee
37
2
0
25 Jan 2023
Holistically Explainable Vision Transformers
Holistically Explainable Vision Transformers
Moritz D Boehle
Mario Fritz
Bernt Schiele
ViT
38
9
0
20 Jan 2023
Does progress on ImageNet transfer to real-world datasets?
Does progress on ImageNet transfer to real-world datasets?
Alex Fang
Simon Kornblith
Ludwig Schmidt
VLM
23
34
0
11 Jan 2023
CiT: Curation in Training for Effective Vision-Language Data
CiT: Curation in Training for Effective Vision-Language Data
Hu Xu
Saining Xie
Po-Yao (Bernie) Huang
Licheng Yu
Russ Howes
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLM
DiffM
33
24
0
05 Jan 2023
Representation Separation for Semantic Segmentation with Vision
  Transformers
Representation Separation for Semantic Segmentation with Vision Transformers
Yuanduo Hong
Huihui Pan
Weichao Sun
Xinghu Yu
Huijun Gao
ViT
28
5
0
28 Dec 2022
RangeAugment: Efficient Online Augmentation with Range Learning
RangeAugment: Efficient Online Augmentation with Range Learning
Sachin Mehta
Saeid Naderiparizi
Fartash Faghri
Maxwell Horton
Lailin Chen
Ali Farhadi
Oncel Tuzel
Mohammad Rastegari
26
6
0
20 Dec 2022
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
Simone Klenk
David Bonello
Lukas Koestler
Nikita Araslanov
Daniel Cremers
29
23
0
20 Dec 2022
Scalable Diffusion Models with Transformers
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
40
2,024
0
19 Dec 2022
FlexiViT: One Model for All Patch Sizes
FlexiViT: One Model for All Patch Sizes
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim M. Alabdulmohsin
Filip Pavetić
VLM
45
90
0
15 Dec 2022
Learning useful representations for shifting tasks and distributions
Learning useful representations for shifting tasks and distributions
Jianyu Zhang
Léon Bottou
OOD
34
13
0
14 Dec 2022
OAMixer: Object-aware Mixing Layer for Vision Transformers
OAMixer: Object-aware Mixing Layer for Vision Transformers
H. Kang
Sangwoo Mo
Jinwoo Shin
VLM
39
4
0
13 Dec 2022
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1
  Accuracy with ViT-B and ViT-L on ImageNet
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
22
35
0
12 Dec 2022
Spurious Features Everywhere -- Large-Scale Detection of Harmful
  Spurious Features in ImageNet
Spurious Features Everywhere -- Large-Scale Detection of Harmful Spurious Features in ImageNet
Yannic Neuhaus
Maximilian Augustin
Valentyn Boreiko
Matthias Hein
AAML
36
30
0
09 Dec 2022
Mitigation of Spatial Nonstationarity with Vision Transformers
Mitigation of Spatial Nonstationarity with Vision Transformers
Lei Liu
Javier E. Santos
Mavsa Prodanović
Michael J. Pyrcz
12
4
0
09 Dec 2022
Learning Video Representations from Large Language Models
Learning Video Representations from Large Language Models
Yue Zhao
Ishan Misra
Philipp Krahenbuhl
Rohit Girdhar
VLM
AI4TS
28
165
0
08 Dec 2022
Teaching Matters: Investigating the Role of Supervision in Vision
  Transformers
Teaching Matters: Investigating the Role of Supervision in Vision Transformers
Matthew Walmer
Saksham Suri
Kamal Gupta
Abhinav Shrivastava
38
33
0
07 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video
  Learning
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
36
54
0
06 Dec 2022
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Mathilde Caron
N. Houlsby
Cordelia Schmid
ViT
24
10
0
05 Dec 2022
ResFormer: Scaling ViTs with Multi-Resolution Training
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
19
32
0
01 Dec 2022
Part-based Face Recognition with Vision Transformers
Part-based Face Recognition with Vision Transformers
Zhonglin Sun
Georgios Tzimiropoulos
ViT
25
15
0
30 Nov 2022
Differentially Private Image Classification from Features
Differentially Private Image Classification from Features
Harsh Mehta
Walid Krichene
Abhradeep Thakurta
Alexey Kurakin
Ashok Cutkosky
52
7
0
24 Nov 2022
How to Fine-Tune Vision Models with SGD
How to Fine-Tune Vision Models with SGD
Ananya Kumar
Ruoqi Shen
Sébastien Bubeck
Suriya Gunasekar
VLM
8
29
0
17 Nov 2022
Using Human Perception to Regularize Transfer Learning
Using Human Perception to Regularize Transfer Learning
Justin Dulay
Walter J. Scheirer
24
8
0
15 Nov 2022
Language models are good pathologists: using attention-based sequence
  reduction and text-pretrained transformers for efficient WSI classification
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Juan Pisula
Katarzyna Bozek
VLM
MedIm
36
3
0
14 Nov 2022
Harmonizing the object recognition strategies of deep neural networks
  with humans
Harmonizing the object recognition strategies of deep neural networks with humans
Thomas Fel
Ivan Felipe
Drew Linsley
Thomas Serre
36
71
0
08 Nov 2022
Data Level Lottery Ticket Hypothesis for Vision Transformers
Data Level Lottery Ticket Hypothesis for Vision Transformers
Xuan Shen
Zhenglun Kong
Minghai Qin
Peiyan Dong
Geng Yuan
Xin Meng
Hao Tang
Xiaolong Ma
Yanzhi Wang
30
6
0
02 Nov 2022
A simple, efficient and scalable contrastive masked autoencoder for
  learning visual representations
A simple, efficient and scalable contrastive masked autoencoder for learning visual representations
Shlok Kumar Mishra
Joshua Robinson
Huiwen Chang
David Jacobs
Aaron Sarna
Aaron Maschinot
Dilip Krishnan
DiffM
43
30
0
30 Oct 2022
Single-Shot Domain Adaptation via Target-Aware Generative Augmentation
Single-Shot Domain Adaptation via Target-Aware Generative Augmentation
Rakshith Subramanyam
Kowshik Thopalli
Spring Berman
P. Turaga
Jayaraman J. Thiagarajan
TTA
31
1
0
29 Oct 2022
Facial Action Unit Detection and Intensity Estimation from
  Self-supervised Representation
Facial Action Unit Detection and Intensity Estimation from Self-supervised Representation
Bowen Ma
Rudong An
Wei Zhang
Yu-qiong Ding
Zeng Zhao
Rongsheng Zhang
Tangjie Lv
Changjie Fan
Zhipeng Hu
CVBM
62
19
0
28 Oct 2022
Deep Model Reassembly
Deep Model Reassembly
Xingyi Yang
Zhou Daquan
Songhua Liu
Jingwen Ye
Xinchao Wang
MoMe
20
120
0
24 Oct 2022
Delving into Masked Autoencoders for Multi-Label Thorax Disease
  Classification
Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification
Junfei Xiao
Yutong Bai
Alan Yuille
Zongwei Zhou
MedIm
ViT
37
59
0
23 Oct 2022
Similarity of Neural Architectures using Adversarial Attack
  Transferability
Similarity of Neural Architectures using Adversarial Attack Transferability
Jaehui Hwang
Dongyoon Han
Byeongho Heo
Song Park
Sanghyuk Chun
Jong-Seok Lee
AAML
32
1
0
20 Oct 2022
A Survey of Computer Vision Technologies In Urban and
  Controlled-environment Agriculture
A Survey of Computer Vision Technologies In Urban and Controlled-environment Agriculture
Jiayun Luo
Boyang Albert Li
Cyril Leung
53
11
0
20 Oct 2022
How Does a Deep Learning Model Architecture Impact Its Privacy? A
  Comprehensive Study of Privacy Attacks on CNNs and Transformers
How Does a Deep Learning Model Architecture Impact Its Privacy? A Comprehensive Study of Privacy Attacks on CNNs and Transformers
Guangsheng Zhang
B. Liu
Huan Tian
Tianqing Zhu
Ming Ding
Wanlei Zhou
PILM
MIACV
20
5
0
20 Oct 2022
OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover
  Mapping
OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover Mapping
J. Xia
Naoto Yokoya
B. Adriano
Clifford Broni-Bediako
VLM
36
69
0
19 Oct 2022
Token Merging: Your ViT But Faster
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
51
417
0
17 Oct 2022
CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
Denis Kuznedelev
Eldar Kurtic
Elias Frantar
Dan Alistarh
VLM
ViT
16
11
0
14 Oct 2022
When Adversarial Training Meets Vision Transformers: Recipes from
  Training to Architecture
When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture
Yi Mo
Dongxian Wu
Yifei Wang
Yiwen Guo
Yisen Wang
ViT
45
52
0
14 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
31
47
0
13 Oct 2022
How Much Data Are Augmentations Worth? An Investigation into Scaling
  Laws, Invariance, and Implicit Regularization
How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization
Jonas Geiping
Micah Goldblum
Gowthami Somepalli
Ravid Shwartz-Ziv
Tom Goldstein
A. Wilson
26
35
0
12 Oct 2022
Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion
  Image Manipulation
Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion Image Manipulation
Chaerin Kong
D. Jeon
Oh-Hun Kwon
Nojun Kwak
DiffM
19
16
0
12 Oct 2022
SegViT: Semantic Segmentation with Plain Vision Transformers
SegViT: Semantic Segmentation with Plain Vision Transformers
Bowen Zhang
Zhi Tian
Quan Tang
Xiangxiang Chu
Xiaolin K. Wei
Chunhua Shen
Yifan Liu
ViT
21
134
0
12 Oct 2022
A Simple Baseline that Questions the Use of Pretrained-Models in
  Continual Learning
A Simple Baseline that Questions the Use of Pretrained-Models in Continual Learning
Paul Janson
Wenxuan Zhang
Rahaf Aljundi
Mohamed Elhoseiny
VLM
SSL
CLL
32
52
0
10 Oct 2022
APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal
  Representations
APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations
Elan Rosenfeld
Preetum Nakkiran
Hadi Pouransari
Oncel Tuzel
Fartash Faghri
68
6
0
08 Oct 2022
Previous
123456789
Next