ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.16541
  4. Cited By
AdaSGD: Bridging the gap between SGD and Adam

AdaSGD: Bridging the gap between SGD and Adam

30 June 2020
Jiaxuan Wang
Jenna Wiens
ArXivPDFHTML

Papers citing "AdaSGD: Bridging the gap between SGD and Adam"

18 / 18 papers shown
Title
Deconstructing What Makes a Good Optimizer for Language Models
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
92
23
0
10 Jul 2024
On Empirical Comparisons of Optimizers for Deep Learning
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi
Christopher J. Shallue
Zachary Nado
Jaehoon Lee
Chris J. Maddison
George E. Dahl
75
260
0
11 Oct 2019
Introduction to Online Convex Optimization
Introduction to Online Convex Optimization
Elad Hazan
OffRL
175
1,932
0
07 Sep 2019
Painless Stochastic Gradient: Interpolation, Line-Search, and
  Convergence Rates
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
Sharan Vaswani
Aaron Mishkin
I. Laradji
Mark Schmidt
Gauthier Gidel
Simon Lacoste-Julien
ODL
84
209
0
24 May 2019
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Liangchen Luo
Yuanhao Xiong
Yan Liu
Xu Sun
ODL
80
602
0
26 Feb 2019
Gradient Descent Happens in a Tiny Subspace
Gradient Descent Happens in a Tiny Subspace
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
98
233
0
12 Dec 2018
On the Convergence of Adaptive Gradient Methods for Nonconvex
  Optimization
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Dongruo Zhou
Yiqi Tang
Yuan Cao
Ziyan Yang
Quanquan Gu
55
151
0
16 Aug 2018
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex
  Optimization
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
Xiangyi Chen
Sijia Liu
Ruoyu Sun
Mingyi Hong
55
323
0
08 Aug 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training
  Deep Neural Networks
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
79
193
0
18 Jun 2018
AutoAugment: Learning Augmentation Policies from Data
AutoAugment: Learning Augmentation Policies from Data
E. D. Cubuk
Barret Zoph
Dandelion Mané
Vijay Vasudevan
Quoc V. Le
126
1,772
0
24 May 2018
On the Convergence of Stochastic Gradient Descent with Adaptive
  Stepsizes
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
Xiaoyun Li
Francesco Orabona
69
295
0
21 May 2018
Improving Generalization Performance by Switching from Adam to SGD
Improving Generalization Performance by Switching from Adam to SGD
N. Keskar
R. Socher
ODL
98
524
0
20 Dec 2017
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning
  Algorithms
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao
Kashif Rasul
Roland Vollgraf
283
8,883
0
25 Aug 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
65
1,032
0
23 May 2017
Multitask learning and benchmarking with clinical time series data
Multitask learning and benchmarking with clinical time series data
Hrayr Harutyunyan
Hrant Khachatrian
David C. Kale
Greg Ver Steeg
Aram Galstyan
OOD
AI4TS
184
878
0
22 Mar 2017
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
339
4,629
0
10 Nov 2016
Pointer Sentinel Mixture Models
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
319
2,876
0
26 Sep 2016
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
346
10,070
0
10 Feb 2015
1