ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.08654
  4. Cited By
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks:
  Margin Improvement and Fast Optimization
v1v2 (latest)

Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization

12 June 2024
Yuhang Cai
Jingfeng Wu
Song Mei
Michael Lindsey
Peter L. Bartlett
ArXiv (abs)PDFHTML

Papers citing "Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization"

22 / 22 papers shown
Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes
Ruiqi Zhang
Jingfeng Wu
Licong Lin
Peter L. Bartlett
70
2
0
05 Apr 2025
Understanding the Generalization Benefits of Late Learning Rate Decay
Understanding the Generalization Benefits of Late Learning Rate Decay
Yinuo Ren
Chao Ma
Lexing Ying
AI4CE
70
6
0
21 Jan 2024
Good regularity creates large learning rate implicit biases: edge of
  stability, balancing, and catapult
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
Yuqing Wang
Zhenghao Xu
Tuo Zhao
Molei Tao
72
11
0
26 Oct 2023
Learning threshold neurons via the "edge of stability"
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
79
41
0
14 Dec 2022
Understanding Edge-of-Stability Training Dynamics with a Minimalist
  Example
Understanding Edge-of-Stability Training Dynamics with a Minimalist Example
Xingyu Zhu
Zixuan Wang
Xiang Wang
Mo Zhou
Rong Ge
107
39
0
07 Oct 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge
  of Stability
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Z. Li
Zixuan Wang
Jian Li
53
47
0
26 Jul 2022
Beyond the Edge of Stability via Two-step Gradient Updates
Beyond the Edge of Stability via Two-step Gradient Updates
Lei Chen
Joan Bruna
MLT
35
10
0
08 Jun 2022
Understanding the unstable convergence of gradient descent
Understanding the unstable convergence of gradient descent
Kwangjun Ahn
J.N. Zhang
S. Sra
74
62
0
03 Apr 2022
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
98
42
0
07 Oct 2021
Fast Margin Maximization via Dual Acceleration
Fast Margin Maximization via Dual Acceleration
Ziwei Ji
Nathan Srebro
Matus Telgarsky
53
39
0
01 Jul 2021
When does gradient descent with logistic loss interpolate using deep
  networks with smoothed ReLU activations?
When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?
Niladri S. Chatterji
Philip M. Long
Peter L. Bartlett
25
12
0
09 Feb 2021
Implicit Gradient Regularization
Implicit Gradient Regularization
David Barrett
Benoit Dherin
73
151
0
23 Sep 2020
Directional convergence and alignment in deep learning
Directional convergence and alignment in deep learning
Ziwei Ji
Matus Telgarsky
59
171
0
11 Jun 2020
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks
  Trained with the Logistic Loss
Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss
Lénaïc Chizat
Francis R. Bach
MLT
124
341
0
11 Feb 2020
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
Kaifeng Lyu
Jian Li
89
336
0
13 Jun 2019
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLTODL
227
1,275
0
04 Oct 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Implicit Bias of Gradient Descent on Linear Convolutional Networks
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
MDE
124
413
0
01 Jun 2018
A Mean Field View of the Landscape of Two-Layers Neural Networks
A Mean Field View of the Landscape of Two-Layers Neural Networks
Song Mei
Andrea Montanari
Phan-Minh Nguyen
MLT
101
861
0
18 Apr 2018
Risk and parameter convergence of logistic regression
Risk and parameter convergence of logistic regression
Ziwei Ji
Matus Telgarsky
73
130
0
20 Mar 2018
Convergence of Gradient Descent on Separable Data
Convergence of Gradient Descent on Separable Data
Mor Shpigel Nacson
Jason D. Lee
Suriya Gunasekar
Pedro H. P. Savarese
Nathan Srebro
Daniel Soudry
76
169
0
05 Mar 2018
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
128
3,685
0
08 Jun 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
427
2,945
0
15 Sep 2016
1