Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.10270
Cited By
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
18 June 2021
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers"
50 / 415 papers shown
Title
Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance
Jinwoo Kim
Tien Dat Nguyen
Ayhan Suleymanzade
Hyeokjun An
Seunghoon Hong
50
23
0
05 Jun 2023
Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers
Chenyang Lu
Daan de Geus
Gijs Dubbelman
ViT
25
20
0
03 Jun 2023
In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation
Julian Bitterwolf
Maximilian Müller
Matthias Hein
OODD
19
83
0
01 Jun 2023
Diffused Redundancy in Pre-trained Representations
Vedant Nanda
Till Speicher
John P. Dickerson
S. Feizi
Krishna P. Gummadi
Adrian Weller
SSL
23
2
0
31 May 2023
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang
Bhishma Dedhia
N. Jha
ViT
VLM
41
26
0
27 May 2023
Sharpness-Aware Minimization Leads to Low-Rank Features
Maksym Andriushchenko
Dara Bahri
H. Mobahi
Nicolas Flammarion
AAML
25
25
0
25 May 2023
VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale
Zhiwei Hao
Jianyuan Guo
Kai Han
Han Hu
Chang Xu
Yunhe Wang
35
16
0
25 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining
Emanuele Bugliarello
Aida Nematzadeh
Lisa Anne Hendricks
SSL
24
5
0
23 May 2023
Target-Aware Generative Augmentations for Single-Shot Adaptation
Kowshik Thopalli
Rakshith Subramanyam
P. Turaga
Jayaraman J. Thiagarajan
TTA
42
5
0
22 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Ibrahim M. Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
27
57
0
22 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
36
92
0
19 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
30
22
0
12 May 2023
CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation
J. Heo
S. Azizi
A. Fayyazi
Massoud Pedram
23
0
0
08 May 2023
Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement
Ailin Deng
Miao Xiong
Bryan Hooi
41
6
0
02 May 2023
Modality-invariant Visual Odometry for Embodied Vision
Marius Memmel
Roman Bachmann
Amir Zamir
54
8
0
29 Apr 2023
SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition
N. V. R. Chappa
Pha Nguyen
Alec Nelson
Han-Seok Seo
Xin Li
P. Dobbs
Khoa Luu
ViT
36
8
0
27 Apr 2023
Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning
Zhongzhi Yu
Shang Wu
Y. Fu
Shunyao Zhang
Yingyan Lin
33
6
0
25 Apr 2023
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
44
273
0
24 Apr 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers
A. Gritsenko
Xuehan Xiong
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
Anurag Arnab
ViT
34
13
0
24 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
110
3,041
0
14 Apr 2023
ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification
Mohammad Reza Taesiri
Giang Nguyen
Sarra Habchi
C. Bezemer
Anh Totti Nguyen
VLM
34
20
0
11 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
Linking Representations with Multimodal Contrastive Learning
Abhishek Arora
Xinmei Yang
Shao-Yu Jheng
Melissa Dell
25
1
0
07 Apr 2023
ERM++: An Improved Baseline for Domain Generalization
Piotr Teterwak
Kuniaki Saito
Theodoros Tsiligkaridis
Kate Saenko
Bryan A. Plummer
OOD
38
9
0
04 Apr 2023
WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation
Liang Zhu
Yingyue Li
Jiemin Fang
Yan Liu
Hao Xin
Wenyu Liu
Xinggang Wang
ViT
31
28
0
03 Apr 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Lucas Beyer
Bo Wan
Gagan Madan
Filip Pavetić
Andreas Steiner
...
Emanuele Bugliarello
Tianlin Li
Qihang Yu
Liang-Chieh Chen
Xiaohua Zhai
51
8
0
30 Mar 2023
Towards Understanding the Effect of Pretraining Label Granularity
Guanzhe Hong
Huayu Chen
Ariel Fuxman
Stanley H. Chan
Enming Luo
19
2
0
29 Mar 2023
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
30
951
0
27 Mar 2023
Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression
Denis Kuznedelev
Soroush Tabesh
Kimia Noorbakhsh
Elias Frantar
Sara Beery
Eldar Kurtic
Dan Alistarh
MQ
VLM
26
2
0
25 Mar 2023
Train/Test-Time Adaptation with Retrieval
L. Zancato
Alessandro Achille
Tian Yu Liu
Matthew Trager
Pramuditha Perera
Stefano Soatto
TTA
OOD
24
12
0
25 Mar 2023
A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias
Puja Trivedi
Danai Koutra
Jayaraman J. Thiagarajan
AAML
40
17
0
23 Mar 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh
Quentin Duval
Kalyan Vasudev Alwala
Haoqi Fan
Vaibhav Aggarwal
...
Piotr Dollár
Christoph Feichtenhofer
Ross B. Girshick
Rohit Girdhar
Ishan Misra
LRM
113
63
0
23 Mar 2023
Instance-Conditioned GAN Data Augmentation for Representation Learning
Pietro Astolfi
Arantxa Casanova
Jakob Verbeek
Pascal Vincent
Adriana Romero Soriano
M. Drozdzal
26
6
0
16 Mar 2023
High-level Feature Guided Decoding for Semantic Segmentation
Ye Huang
Di Kang
Shenghua Gao
Wen Li
Lixin Duan
23
0
0
15 Mar 2023
Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection
Nikhil J. Dhinagar
Sophia I Thomopoulos
Emily Laltoo
Paul M. Thompson
DiffM
MedIm
47
16
0
14 Mar 2023
Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need
Da-Wei Zhou
Han-Jia Ye
De-Chuan Zhan
Ziwei Liu
CLL
33
99
0
13 Mar 2023
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Jierun Chen
Shiu-hong Kao
Hao He
Weipeng Zhuo
Song Wen
Chul-Ho Lee
Shueng-Han Gary Chan
OOD
32
779
0
07 Mar 2023
SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition
N. V. R. Chappa
Pha Nguyen
Alec Nelson
Han-Seok Seo
Xin Li
P. Dobbs
Khoa Luu
ViT
45
14
0
06 Mar 2023
Training-Free Acceleration of ViTs with Delayed Spatial Merging
J. Heo
Seyedarmin Azizi
A. Fayyazi
Massoud Pedram
36
3
0
04 Mar 2023
Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective
Animesh Gupta
Irtiza Hassan
Dilip K. Prasad
D. K. Gupta
21
2
0
03 Mar 2023
Dropout Reduces Underfitting
Zhuang Liu
Zhi-Qin John Xu
Joseph Jin
Zhiqiang Shen
Trevor Darrell
37
36
0
02 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
39
221
0
27 Feb 2023
TBFormer: Two-Branch Transformer for Image Forgery Localization
Yaqi Liu
Binbin Lv
Xin Jin
Xiaoyue Chen
Xiaokun Zhang
ViT
18
27
0
25 Feb 2023
A framework for benchmarking class-out-of-distribution detection and its application to ImageNet
Ido Galil
Mohammed Dabbah
Ran El-Yaniv
UQCV
24
28
0
23 Feb 2023
What Can We Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet Classifiers
Ido Galil
Mohammed Dabbah
Ran El-Yaniv
UQCV
30
24
0
23 Feb 2023
Steerable Equivariant Representation Learning
Sangnie Bhardwaj
Willie McClinton
Tongzhou Wang
Guillaume Lajoie
Chen Sun
Phillip Isola
Dilip Krishnan
OOD
LLMSV
34
5
0
22 Feb 2023
Gradient-based Wang-Landau Algorithm: A Novel Sampler for Output Distribution of Neural Networks over the Input Space
Weitang Liu
Ying-Wai Li
Yi-Zhuang You
Jingbo Shang
16
1
0
19 Feb 2023
Conformers are All You Need for Visual Speech Recognition
Oscar Chang
H. Liao
Dmitriy Serdyuk
Ankit Parag Shah
Olivier Siohan
VLM
48
14
0
17 Feb 2023
Efficiency 360: Efficient Vision Transformers
Badri N. Patro
Vijay Srinivas Agneeswaran
26
6
0
16 Feb 2023
Tuning computer vision models with task rewards
André Susano Pinto
Alexander Kolesnikov
Yuge Shi
Lucas Beyer
Xiaohua Zhai
VLM
27
40
0
16 Feb 2023
Previous
1
2
3
4
5
6
7
8
9
Next