Robust Training of Neural Networks Using Scale Invariant Architectures

2 February 2022

Srinadh Bhojanapalli

Papers citing "Robust Training of Neural Networks Using Scale Invariant Architectures"

22 / 22 papers shown

Title
A Minimalist Example of Edge-of-Stability and Progressive Sharpening Liming Liu Zixuan Zhang S. Du T. Zhao 76 0 0 04 Mar 2025
HSR-Enhanced Sparse Attention Acceleration Bo Chen Yingyu Liang Zhizhou Sha Zhenmei Shi Zhao-quan Song 95 18 0 14 Oct 2024
Optimized Speculative Sampling for GPU Hardware Accelerators Dominik Wagner Seanie Lee Ilja Baumann Philipp Seeberger K. Riedhammer Tobias Bocklet 48 3 0 16 Jun 2024
How to set AdamW's weight decay as you scale model and dataset size Xi Wang Laurence Aitchison 46 9 0 22 May 2024
$Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization$ Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization Shuo Xie Zhiyuan Li OffRL 47 13 0 05 Apr 2024
Efficient Language Model Architectures for Differentially Private Federated Learning Jae Hun Ro Srinadh Bhojanapalli Zheng Xu Yanxiang Zhang A. Suresh FedML 47 2 0 12 Mar 2024
The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks Lénaic Chizat Praneeth Netrapalli 20 4 0 30 Nov 2023
Why Do We Need Weight Decay in Modern Deep Learning? Maksym Andriushchenko Francesco DÁngelo Aditya Varre Nicolas Flammarion 29 27 0 06 Oct 2023
Replacing softmax with ReLU in Vision Transformers Mitchell Wortsman Jaehoon Lee Justin Gilmer Simon Kornblith ViT 30 33 0 15 Sep 2023
CAME: Confidence-guided Adaptive Memory Efficient Optimization Yang Luo Xiaozhe Ren Zangwei Zheng Zhuo Jiang Xin Jiang Yang You ODL 20 17 0 05 Jul 2023
Universality and Limitations of Prompt Tuning Yihan Wang Jatin Chauhan Wei Wang Cho-Jui Hsieh 37 17 0 30 May 2023
Fine-Tuning Language Models with Just Forward Passes Sadhika Malladi Tianyu Gao Eshaan Nichani Alexandru Damian Jason D. Lee Danqi Chen Sanjeev Arora 27 177 0 27 May 2023
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks Atli Kosson Bettina Messmer Martin Jaggi 35 11 0 26 May 2023
Towards the Transferable Audio Adversarial Attack via Ensemble Methods Feng Guo Zhengyi Sun Yuxuan Chen Lei Ju AAML 25 2 0 18 Apr 2023
Convergence of variational Monte Carlo simulation and scale-invariant pre-training Nilin Abrahamsen Zhiyan Ding Gil Goldshlager Lin Lin DRL 32 2 0 21 Mar 2023
Stabilizing Transformer Training by Preventing Attention Entropy Collapse Shuangfei Zhai Tatiana Likhomanenko Etai Littwin Dan Busbridge Jason Ramapuram Yizhe Zhang Jiatao Gu J. Susskind AAML 46 64 0 11 Mar 2023
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis Taiki Miyagawa 50 9 0 28 Oct 2022
A Kernel-Based View of Language Model Fine-Tuning Sadhika Malladi Alexander Wettig Dingli Yu Danqi Chen Sanjeev Arora VLM 68 60 0 11 Oct 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example Xingyu Zhu Zixuan Wang Xiang Wang Mo Zhou Rong Ge 66 35 0 07 Oct 2022
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes M. Kodryan E. Lobacheva M. Nakhodnov Dmitry Vetrov 42 15 0 08 Sep 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 40 69 0 14 Jun 2022
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 159 234 0 04 Mar 2020