ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.15244
  4. Cited By
Large Batch Analysis for Adagrad Under Anisotropic Smoothness

Large Batch Analysis for Adagrad Under Anisotropic Smoothness

21 June 2024
Yuxing Liu
Rui Pan
Tong Zhang
ArXivPDFHTML

Papers citing "Large Batch Analysis for Adagrad Under Anisotropic Smoothness"

22 / 22 papers shown
Title
Why Transformers Need Adam: A Hessian Perspective
Why Transformers Need Adam: A Hessian Perspective
Yushun Zhang
Congliang Chen
Tian Ding
Ziniu Li
Ruoyu Sun
Zhimin Luo
77
53
0
26 Feb 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Yusu Hong
Junhong Lin
89
13
0
06 Feb 2024
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
  Stepsize
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize
Ali Kavis
Kfir Y. Levy
Volkan Cevher
47
40
0
06 Apr 2022
Dissecting Hessian: Understanding Common Structure of Hessian in Neural
  Networks
Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks
Yikai Wu
Xingyu Zhu
Chenwei Wu
Annie Wang
Rong Ge
70
45
0
08 Oct 2020
Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale
  of Symmetry
Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry
Yossi Arjevani
M. Field
37
16
0
04 Aug 2020
A Simple Convergence Proof of Adam and Adagrad
A Simple Convergence Proof of Adam and Adagrad
Alexandre Défossez
Léon Bottou
Francis R. Bach
Nicolas Usunier
104
155
0
05 Mar 2020
Better Theory for SGD in the Nonconvex World
Better Theory for SGD in the Nonconvex World
Ahmed Khaled
Peter Richtárik
53
182
0
09 Feb 2020
A Survey on Distributed Machine Learning
A Survey on Distributed Machine Learning
Joost Verbraeken
Matthijs Wolting
Jonathan Katzy
Jeroen Kloppenburg
Tim Verbelen
Jan S. Rellermeyer
OOD
81
707
0
20 Dec 2019
On Empirical Comparisons of Optimizers for Deep Learning
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
66
260
0
11 Oct 2019
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning
  Rate Procedure For Least Squares
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares
Rong Ge
Sham Kakade
Rahul Kidambi
Praneeth Netrapalli
85
154
0
29 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
208
993
0
01 Apr 2019
Tight Dimension Independent Lower Bound on the Expected Convergence Rate
  for Diminishing Step Sizes in SGD
Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD
Phuong Ha Nguyen
Lam M. Nguyen
Marten van Dijk
LRM
44
31
0
10 Oct 2018
Online Adaptive Methods, Universality and Acceleration
Online Adaptive Methods, Universality and Acceleration
Kfir Y. Levy
A. Yurtsever
Volkan Cevher
ODL
57
92
0
08 Sep 2018
Convergence guarantees for RMSProp and ADAM in non-convex optimization
  and an empirical comparison to Nesterov acceleration
Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration
Soham De
Anirbit Mukherjee
Enayat Ullah
49
101
0
18 Jul 2018
On the Convergence of Stochastic Gradient Descent with Adaptive
  Stepsizes
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
Xiaoyun Li
Francesco Orabona
67
295
0
21 May 2018
WNGrad: Learn the Learning Rate in Gradient Descent
WNGrad: Learn the Learning Rate in Gradient Descent
Xiaoxia Wu
Rachel A. Ward
Léon Bottou
44
87
0
07 Mar 2018
Large Batch Training of Convolutional Networks
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
125
848
0
13 Aug 2017
Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond
Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond
Levent Sagun
Léon Bottou
Yann LeCun
UQCV
81
236
0
22 Nov 2016
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
288
8,091
0
13 Aug 2016
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic
  Programming
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming
Saeed Ghadimi
Guanghui Lan
ODL
120
1,547
0
22 Sep 2013
ADADELTA: An Adaptive Learning Rate Method
ADADELTA: An Adaptive Learning Rate Method
Matthew D. Zeiler
ODL
132
6,623
0
22 Dec 2012
Stochastic Gradient Descent for Non-smooth Optimization: Convergence
  Results and Optimal Averaging Schemes
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Ohad Shamir
Tong Zhang
146
574
0
08 Dec 2012
1