Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.08654
Cited By
v1
v2 (latest)
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
12 June 2024
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization"
22 / 22 papers shown
Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
70
2
0
05 Apr 2025
Understanding the Generalization Benefits of Late Learning Rate Decay
Yinuo Ren
Chao Ma
Lexing Ying
AI4CE
70
6
0
21 Jan 2024
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
Yuqing Wang
Zhenghao Xu
Tuo Zhao
Molei Tao
72
11
0
26 Oct 2023
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
79
41
0
14 Dec 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example
Xingyu Zhu
Zixuan Wang
Xiang Wang
Mo Zhou
Rong Ge
107
39
0
07 Oct 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Z. Li
Zixuan Wang
Jian Li
53
47
0
26 Jul 2022
Beyond the Edge of Stability via Two-step Gradient Updates
Lei Chen
Joan Bruna
MLT
35
10
0
08 Jun 2022
Understanding the unstable convergence of gradient descent
Kwangjun Ahn
J.N. Zhang
S. Sra
74
62
0
03 Apr 2022
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
98
42
0
07 Oct 2021
Fast Margin Maximization via Dual Acceleration
Ziwei Ji
Nathan Srebro
Matus Telgarsky
53
39
0
01 Jul 2021
When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?
Niladri S. Chatterji
Philip M. Long
Peter L. Bartlett
25
12
0
09 Feb 2021
Implicit Gradient Regularization
David Barrett
Benoit Dherin
73
151
0
23 Sep 2020
Directional convergence and alignment in deep learning
Ziwei Ji
Matus Telgarsky
59
171
0
11 Jun 2020
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
Lénaïc Chizat
Francis R. Bach
MLT
124
341
0
11 Feb 2020
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
Kaifeng Lyu
Jian Li
89
336
0
13 Jun 2019
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
227
1,275
0
04 Oct 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
MDE
124
413
0
01 Jun 2018
A Mean Field View of the Landscape of Two-Layers Neural Networks
Song Mei
Andrea Montanari
Phan-Minh Nguyen
MLT
101
861
0
18 Apr 2018
Risk and parameter convergence of logistic regression
Ziwei Ji
Matus Telgarsky
73
130
0
20 Mar 2018
Convergence of Gradient Descent on Separable Data
Mor Shpigel Nacson
Jason D. Lee
Suriya Gunasekar
Pedro H. P. Savarese
Nathan Srebro
Daniel Soudry
76
169
0
05 Mar 2018
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
128
3,685
0
08 Jun 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
427
2,945
0
15 Sep 2016
1