How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers

18 June 2021

Papers citing "How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers"

50 / 415 papers shown

Title
Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance Jinwoo Kim Tien Dat Nguyen Ayhan Suleymanzade Hyeokjun An Seunghoon Hong 50 23 0 05 Jun 2023
Content-aware Token Sharing for Efficient Semantic Segmentation with Vision Transformers Chenyang Lu Daan de Geus Gijs Dubbelman ViT 25 20 0 03 Jun 2023
In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation Julian Bitterwolf Maximilian Müller Matthias Hein OODD 19 83 0 01 Jun 2023
Diffused Redundancy in Pre-trained Representations Vedant Nanda Till Speicher John P. Dickerson S. Feizi Krishna P. Gummadi Adrian Weller SSL 23 2 0 31 May 2023
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers Hongjie Wang Bhishma Dedhia N. Jha ViT VLM 41 26 0 27 May 2023
Sharpness-Aware Minimization Leads to Low-Rank Features Maksym Andriushchenko Dara Bahri H. Mobahi Nicolas Flammarion AAML 25 25 0 25 May 2023
VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale Zhiwei Hao Jianyuan Guo Kai Han Han Hu Chang Xu Yunhe Wang 35 16 0 25 May 2023
Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining Emanuele Bugliarello Aida Nematzadeh Lisa Anne Hendricks SSL 24 5 0 23 May 2023
Target-Aware Generative Augmentations for Single-Shot Adaptation Kowshik Thopalli Rakshith Subramanyam P. Turaga Jayaraman J. Thiagarajan TTA 42 5 0 22 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design Ibrahim M. Alabdulmohsin Xiaohua Zhai Alexander Kolesnikov Lucas Beyer VLM 27 57 0 22 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models Hiroki Furuta Kuang-Huei Lee Ofir Nachum Yutaka Matsuo Aleksandra Faust S. Gu Izzeddin Gur LM&Ro 36 92 0 19 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding Emanuele Bugliarello Laurent Sartran Aishwarya Agrawal Lisa Anne Hendricks Aida Nematzadeh VLM 30 22 0 12 May 2023
CrAFT: Compression-Aware Fine-Tuning for Efficient Visual Task Adaptation J. Heo S. Azizi A. Fayyazi Massoud Pedram 23 0 0 08 May 2023
Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement Ailin Deng Miao Xiong Bryan Hooi 41 6 0 02 May 2023
Modality-invariant Visual Odometry for Embodied Vision Marius Memmel Roman Bachmann Amir Zamir 54 8 0 29 Apr 2023
SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition N. V. R. Chappa Pha Nguyen Alec Nelson Han-Seok Seo Xin Li P. Dobbs Khoa Luu ViT 36 8 0 27 Apr 2023
Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning Zhongzhi Yu Shang Wu Y. Fu Shunyao Zhang Yingyan Lin 33 6 0 25 Apr 2023
A Cookbook of Self-Supervised Learning Randall Balestriero Mark Ibrahim Vlad Sobal Ari S. Morcos Shashank Shekhar ... Pierre Fernandez Amir Bar Hamed Pirsiavash Yann LeCun Micah Goldblum SyDa FedML SSL 44 273 0 24 Apr 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers A. Gritsenko Xuehan Xiong Josip Djolonga Mostafa Dehghani Chen Sun Mario Lucic Cordelia Schmid Anurag Arnab ViT 34 13 0 24 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision Maxime Oquab Timothée Darcet Théo Moutakanni Huy Q. Vo Marc Szafraniec ... Hervé Jégou Julien Mairal Patrick Labatut Armand Joulin Piotr Bojanowski VLM CLIP SSL 110 3,041 0 14 Apr 2023
ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification Mohammad Reza Taesiri Giang Nguyen Sarra Habchi C. Bezemer Anh Totti Nguyen VLM 34 20 0 11 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review Li Shen Yan Sun Zhiyuan Yu Liang Ding Xinmei Tian Dacheng Tao VLM 30 41 0 07 Apr 2023
Linking Representations with Multimodal Contrastive Learning Abhishek Arora Xinmei Yang Shao-Yu Jheng Melissa Dell 25 1 0 07 Apr 2023
ERM++: An Improved Baseline for Domain Generalization Piotr Teterwak Kuniaki Saito Theodoros Tsiligkaridis Kate Saenko Bryan A. Plummer OOD 38 9 0 04 Apr 2023
WeakTr: Exploring Plain Vision Transformer for Weakly-supervised Semantic Segmentation Liang Zhu Yingyue Li Jiemin Fang Yan Liu Hao Xin Wenyu Liu Xinggang Wang ViT 31 28 0 03 Apr 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision Lucas Beyer Bo Wan Gagan Madan Filip Pavetić Andreas Steiner ... Emanuele Bugliarello Tianlin Li Qihang Yu Liang-Chieh Chen Xiaohua Zhai 51 8 0 30 Mar 2023
Towards Understanding the Effect of Pretraining Label Granularity Guanzhe Hong Huayu Chen Ariel Fuxman Stanley H. Chan Enming Luo 19 2 0 29 Mar 2023
Sigmoid Loss for Language Image Pre-Training Xiaohua Zhai Basil Mustafa Alexander Kolesnikov Lucas Beyer CLIP VLM 30 951 0 27 Mar 2023
Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression Denis Kuznedelev Soroush Tabesh Kimia Noorbakhsh Elias Frantar Sara Beery Eldar Kurtic Dan Alistarh MQ VLM 26 2 0 25 Mar 2023
Train/Test-Time Adaptation with Retrieval L. Zancato Alessandro Achille Tian Yu Liu Matthew Trager Pramuditha Perera Stefano Soatto TTA OOD 24 12 0 25 Mar 2023
A Closer Look at Model Adaptation using Feature Distortion and Simplicity Bias Puja Trivedi Danai Koutra Jayaraman J. Thiagarajan AAML 40 17 0 23 Mar 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining Mannat Singh Quentin Duval Kalyan Vasudev Alwala Haoqi Fan Vaibhav Aggarwal ... Piotr Dollár Christoph Feichtenhofer Ross B. Girshick Rohit Girdhar Ishan Misra LRM 113 63 0 23 Mar 2023
Instance-Conditioned GAN Data Augmentation for Representation Learning Pietro Astolfi Arantxa Casanova Jakob Verbeek Pascal Vincent Adriana Romero Soriano M. Drozdzal 26 6 0 16 Mar 2023
High-level Feature Guided Decoding for Semantic Segmentation Ye Huang Di Kang Shenghua Gao Wen Li Lixin Duan 23 0 0 15 Mar 2023
Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection Nikhil J. Dhinagar Sophia I Thomopoulos Emily Laltoo Paul M. Thompson DiffM MedIm 47 16 0 14 Mar 2023
Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need Da-Wei Zhou Han-Jia Ye De-Chuan Zhan Ziwei Liu CLL 33 99 0 13 Mar 2023
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks Jierun Chen Shiu-hong Kao Hao He Weipeng Zhuo Song Wen Chul-Ho Lee Shueng-Han Gary Chan OOD 32 779 0 07 Mar 2023
SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition N. V. R. Chappa Pha Nguyen Alec Nelson Han-Seok Seo Xin Li P. Dobbs Khoa Luu ViT 45 14 0 06 Mar 2023
Training-Free Acceleration of ViTs with Delayed Spatial Merging J. Heo Seyedarmin Azizi A. Fayyazi Massoud Pedram 36 3 0 04 Mar 2023
Data-Efficient Training of CNNs and Transformers with Coresets: A Stability Perspective Animesh Gupta Irtiza Hassan Dilip K. Prasad D. K. Gupta 21 2 0 03 Mar 2023
Dropout Reduces Underfitting Zhuang Liu Zhi-Qin John Xu Joseph Jin Zhiqiang Shen Trevor Darrell 37 36 0 02 Mar 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning Antoine Yang Arsha Nagrani Paul Hongsuck Seo Antoine Miech Jordi Pont-Tuset Ivan Laptev Josef Sivic Cordelia Schmid AI4TS VLM 39 221 0 27 Feb 2023
TBFormer: Two-Branch Transformer for Image Forgery Localization Yaqi Liu Binbin Lv Xin Jin Xiaoyue Chen Xiaokun Zhang ViT 18 27 0 25 Feb 2023
A framework for benchmarking class-out-of-distribution detection and its application to ImageNet Ido Galil Mohammed Dabbah Ran El-Yaniv UQCV 24 28 0 23 Feb 2023
What Can We Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet Classifiers Ido Galil Mohammed Dabbah Ran El-Yaniv UQCV 30 24 0 23 Feb 2023
Steerable Equivariant Representation Learning Sangnie Bhardwaj Willie McClinton Tongzhou Wang Guillaume Lajoie Chen Sun Phillip Isola Dilip Krishnan OOD LLMSV 34 5 0 22 Feb 2023
Gradient-based Wang-Landau Algorithm: A Novel Sampler for Output Distribution of Neural Networks over the Input Space Weitang Liu Ying-Wai Li Yi-Zhuang You Jingbo Shang 16 1 0 19 Feb 2023
Conformers are All You Need for Visual Speech Recognition Oscar Chang H. Liao Dmitriy Serdyuk Ankit Parag Shah Olivier Siohan VLM 48 14 0 17 Feb 2023
Efficiency 360: Efficient Vision Transformers Badri N. Patro Vijay Srinivas Agneeswaran 26 6 0 16 Feb 2023
Tuning computer vision models with task rewards André Susano Pinto Alexander Kolesnikov Yuge Shi Lucas Beyer Xiaohua Zhai VLM 27 40 0 16 Feb 2023