Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.00980
Cited By
Robust Training of Neural Networks Using Scale Invariant Architectures
2 February 2022
Zhiyuan Li
Srinadh Bhojanapalli
Manzil Zaheer
Sashank J. Reddi
Surinder Kumar
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Robust Training of Neural Networks Using Scale Invariant Architectures"
22 / 22 papers shown
Title
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Liming Liu
Zixuan Zhang
S. Du
T. Zhao
76
0
0
04 Mar 2025
HSR-Enhanced Sparse Attention Acceleration
Bo Chen
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao-quan Song
95
18
0
14 Oct 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
K. Riedhammer
Tobias Bocklet
48
3
0
16 Jun 2024
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
46
9
0
22 May 2024
Implicit Bias of AdamW:
ℓ
∞
\ell_\infty
ℓ
∞
Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
47
13
0
05 Apr 2024
Efficient Language Model Architectures for Differentially Private Federated Learning
Jae Hun Ro
Srinadh Bhojanapalli
Zheng Xu
Yanxiang Zhang
A. Suresh
FedML
47
2
0
12 Mar 2024
The Feature Speed Formula: a flexible approach to scale hyper-parameters of deep neural networks
Lénaic Chizat
Praneeth Netrapalli
20
4
0
30 Nov 2023
Why Do We Need Weight Decay in Modern Deep Learning?
Maksym Andriushchenko
Francesco DÁngelo
Aditya Varre
Nicolas Flammarion
29
27
0
06 Oct 2023
Replacing softmax with ReLU in Vision Transformers
Mitchell Wortsman
Jaehoon Lee
Justin Gilmer
Simon Kornblith
ViT
30
33
0
15 Sep 2023
CAME: Confidence-guided Adaptive Memory Efficient Optimization
Yang Luo
Xiaozhe Ren
Zangwei Zheng
Zhuo Jiang
Xin Jiang
Yang You
ODL
20
17
0
05 Jul 2023
Universality and Limitations of Prompt Tuning
Yihan Wang
Jatin Chauhan
Wei Wang
Cho-Jui Hsieh
37
17
0
30 May 2023
Fine-Tuning Language Models with Just Forward Passes
Sadhika Malladi
Tianyu Gao
Eshaan Nichani
Alexandru Damian
Jason D. Lee
Danqi Chen
Sanjeev Arora
27
177
0
27 May 2023
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Atli Kosson
Bettina Messmer
Martin Jaggi
35
11
0
26 May 2023
Towards the Transferable Audio Adversarial Attack via Ensemble Methods
Feng Guo
Zhengyi Sun
Yuxuan Chen
Lei Ju
AAML
25
2
0
18 Apr 2023
Convergence of variational Monte Carlo simulation and scale-invariant pre-training
Nilin Abrahamsen
Zhiyan Ding
Gil Goldshlager
Lin Lin
DRL
32
2
0
21 Mar 2023
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Shuangfei Zhai
Tatiana Likhomanenko
Etai Littwin
Dan Busbridge
Jason Ramapuram
Yizhe Zhang
Jiatao Gu
J. Susskind
AAML
46
64
0
11 Mar 2023
Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis
Taiki Miyagawa
50
9
0
28 Oct 2022
A Kernel-Based View of Language Model Fine-Tuning
Sadhika Malladi
Alexander Wettig
Dingli Yu
Danqi Chen
Sanjeev Arora
VLM
68
60
0
11 Oct 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example
Xingyu Zhu
Zixuan Wang
Xiang Wang
Mo Zhou
Rong Ge
66
35
0
07 Oct 2022
Training Scale-Invariant Neural Networks on the Sphere Can Happen in Three Regimes
M. Kodryan
E. Lobacheva
M. Nakhodnov
Dmitry Vetrov
42
15
0
08 Sep 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
40
69
0
14 Jun 2022
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
234
0
04 Mar 2020
1