$Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization$

Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization

5 April 2024

Papers citing "Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization"

13 / 13 papers shown

Title
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks Chenyang Zhang Peifeng Gao Difan Zou Yuan Cao OOD MLT 59 0 0 11 Apr 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers Akiyoshi Tomihari Issei Sato ODL 61 1 0 31 Jan 2025
Convergence Rate Analysis of LION Yiming Dong Huan Li Zhouchen Lin 39 0 0 12 Nov 2024
A Mirror Descent Perspective of Smoothed Sign Descent Shuyang Wang Diego Klabjan 40 0 0 18 Oct 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition Mohamad Amin Mohamadi Zhiyuan Li Lei Wu Danica J. Sutherland 48 10 0 17 Jul 2024
The Implicit Bias of Adam on Separable Data Chenyang Zhang Difan Zou Yuan Cao AI4CE 45 7 0 15 Jun 2024
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 71 4 1 25 May 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions Yusu Hong Junhong Lin 46 10 0 06 Feb 2024
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be Frederik Kunstner Jacques Chen J. Lavington Mark W. Schmidt 40 67 0 27 Apr 2023
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms Sadhika Malladi Kaifeng Lyu A. Panigrahi Sanjeev Arora 92 40 0 20 May 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning Sanjeev Arora Zhiyuan Li A. Panigrahi MLT 80 89 0 19 May 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework Zhiyuan Li Tianhao Wang Sanjeev Arora MLT 90 98 0 13 Oct 2021
A Simple Convergence Proof of Adam and Adagrad Alexandre Défossez Léon Bottou Francis R. Bach Nicolas Usunier 56 143 0 05 Mar 2020