Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.04454
Cited By
Implicit Bias of AdamW:
ℓ
∞
\ell_\infty
ℓ
∞
Norm Constrained Optimization
5 April 2024
Shuo Xie
Zhiyuan Li
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization"
13 / 13 papers shown
Title
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
Chenyang Zhang
Peifeng Gao
Difan Zou
Yuan Cao
OOD
MLT
59
0
0
11 Apr 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
1
0
31 Jan 2025
Convergence Rate Analysis of LION
Yiming Dong
Huan Li
Zhouchen Lin
39
0
0
12 Nov 2024
A Mirror Descent Perspective of Smoothed Sign Descent
Shuyang Wang
Diego Klabjan
40
0
0
18 Oct 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
Mohamad Amin Mohamadi
Zhiyuan Li
Lei Wu
Danica J. Sutherland
48
10
0
17 Jul 2024
The Implicit Bias of Adam on Separable Data
Chenyang Zhang
Difan Zou
Yuan Cao
AI4CE
45
7
0
15 Jun 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
71
4
1
25 May 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Yusu Hong
Junhong Lin
46
10
0
06 Feb 2024
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
Frederik Kunstner
Jacques Chen
J. Lavington
Mark W. Schmidt
40
67
0
27 Apr 2023
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Sadhika Malladi
Kaifeng Lyu
A. Panigrahi
Sanjeev Arora
92
40
0
20 May 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning
Sanjeev Arora
Zhiyuan Li
A. Panigrahi
MLT
80
89
0
19 May 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
90
98
0
13 Oct 2021
A Simple Convergence Proof of Adam and Adagrad
Alexandre Défossez
Léon Bottou
Francis R. Bach
Nicolas Usunier
56
143
0
05 Mar 2020
1