ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.04081
  4. Cited By
Directional Smoothness and Gradient Methods: Convergence and Adaptivity
v1v2 (latest)

Directional Smoothness and Gradient Methods: Convergence and Adaptivity

6 March 2024
Aaron Mishkin
Ahmed Khaled
Yuanhao Wang
Aaron Defazio
Robert Mansel Gower
ArXiv (abs)PDFHTML

Papers citing "Directional Smoothness and Gradient Methods: Convergence and Adaptivity"

29 / 29 papers shown
Title
Glocal Smoothness: Line Search can really help!
Glocal Smoothness: Line Search can really help!
Curtis Fox
Aaron Mishkin
Sharan Vaswani
Mark Schmidt
10
2
0
14 Jun 2025
New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results
New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results
Francesco Orabona
Ryan DÓrazio
37
0
0
26 May 2025
Nested Stochastic Gradient Descent for (Generalized) Sinkhorn Distance-Regularized Distributionally Robust Optimization
Nested Stochastic Gradient Descent for (Generalized) Sinkhorn Distance-Regularized Distributionally Robust Optimization
Yue Yang
Yi Zhou
Zhaosong Lu
95
0
0
29 Mar 2025
Gradient Descent on Logistic Regression with Non-Separable Data and
  Large Step Sizes
Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes
Si Yi Meng
Antonio Orvieto
Daniel Yiming Cao
Christopher De Sa
80
2
0
07 Jun 2024
SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction
  for Non-convex Cross-Device Federated Learning
SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning
Avetik G. Karagulyan
Egor Shulgin
Abdurakhmon Sadiev
Peter Richtárik
FedML
75
3
0
30 May 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Aaron Mishkin
Mert Pilanci
Mark Schmidt
107
1
0
03 Apr 2024
Non-Uniform Smoothness for Gradient Descent
Non-Uniform Smoothness for Gradient Descent
A. Berahas
Lindon Roberts
Fred Roosta
74
4
0
15 Nov 2023
Normalized Gradients for All
Normalized Gradients for All
Francesco Orabona
85
10
0
10 Aug 2023
Convex and Non-convex Optimization Under Generalized Smoothness
Convex and Non-convex Optimization Under Generalized Smoothness
Haochuan Li
Jian Qian
Yi Tian
Alexander Rakhlin
Ali Jadbabaie
65
43
0
02 Jun 2023
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Yan Pan
Yuanzhi Li
121
44
0
31 May 2023
Adaptive Gradient Methods at the Edge of Stability
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
89
54
0
29 Jul 2022
Accelerated first-order methods for convex optimization with locally
  Lipschitz continuous gradient
Accelerated first-order methods for convex optimization with locally Lipschitz continuous gradient
Zhaosong Lu
Sanyou Mei
20
7
0
02 Jun 2022
Making SGD Parameter-Free
Making SGD Parameter-Free
Y. Carmon
Oliver Hinder
83
47
0
04 May 2022
Understanding the unstable convergence of gradient descent
Understanding the unstable convergence of gradient descent
Kwangjun Ahn
J.N. Zhang
S. Sra
74
62
0
03 Apr 2022
A first-order primal-dual method with adaptivity to local smoothness
A first-order primal-dual method with adaptivity to local smoothness
Maria-Luiza Vladarean
Yura Malitsky
Volkan Cevher
35
15
0
28 Oct 2021
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic
  Objectives with Skewed Hessian Spectrums
Eigencurve: Optimal Learning Rate Schedule for SGD on Quadratic Objectives with Skewed Hessian Spectrums
Boyao Wang
Haishan Ye
Tong Zhang
65
15
0
27 Oct 2021
Leveraging Non-uniformity in First-order Non-convex Optimization
Leveraging Non-uniformity in First-order Non-convex Optimization
Jincheng Mei
Yue Gao
Bo Dai
Csaba Szepesvári
Dale Schuurmans
53
50
0
13 May 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of
  Stability
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability
Jeremy M. Cohen
Simran Kaur
Yuanzhi Li
J. Zico Kolter
Ameet Talwalkar
ODL
82
272
0
26 Feb 2021
Reconciling Modern Deep Learning with Traditional Optimization Analyses:
  The Intrinsic Learning Rate
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate
Zhiyuan Li
Kaifeng Lyu
Sanjeev Arora
94
75
0
06 Oct 2020
Improved Analysis of Clipping Algorithms for Non-convex Optimization
Improved Analysis of Clipping Algorithms for Non-convex Optimization
Bohang Zhang
Jikai Jin
Cong Fang
Liwei Wang
92
92
0
05 Oct 2020
Halting Time is Predictable for Large Models: A Universality Property
  and Average-case Analysis
Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis
Courtney Paquette
B. V. Merrienboer
Elliot Paquette
Fabian Pedregosa
65
26
0
08 Jun 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
520
42,449
0
03 Dec 2019
Why gradient clipping accelerates training: A theoretical justification
  for adaptivity
Why gradient clipping accelerates training: A theoretical justification for adaptivity
J.N. Zhang
Tianxing He
S. Sra
Ali Jadbabaie
76
464
0
28 May 2019
Convergence Rates for Deterministic and Stochastic Subgradient Methods
  Without Lipschitz Continuity
Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity
Benjamin Grimmer
45
41
0
12 Dec 2017
Online to Offline Conversions, Universality and Adaptive Minibatch Sizes
Online to Offline Conversions, Universality and Adaptive Minibatch Sizes
Kfir Y. Levy
ODL
76
59
0
30 May 2017
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark Schmidt
280
1,220
0
16 Aug 2016
Delving Deep into Rectifiers: Surpassing Human-Level Performance on
  ImageNet Classification
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
326
18,625
0
06 Feb 2015
Practical recommendations for gradient-based training of deep
  architectures
Practical recommendations for gradient-based training of deep architectures
Yoshua Bengio
3DHODL
193
2,200
0
24 Jun 2012
Less Regret via Online Conditioning
Less Regret via Online Conditioning
Matthew J. Streeter
H. B. McMahan
ODL
88
66
0
25 Feb 2010
1