Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training

31 August 2021

Papers citing "Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training"

11 / 11 papers shown

Title
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates Sharan Vaswani Aaron Mishkin I. Laradji Mark Schmidt Gauthier Gidel Simon Lacoste-Julien ODL 84 209 0 24 May 2019
Parabolic Approximation Line Search for DNNs Max Mutschler A. Zell ODL 44 20 0 28 Mar 2019
An Empirical Model of Large-Batch Training Sam McCandlish Jared Kaplan Dario Amodei OpenAI Dota Team 65 277 0 14 Dec 2018
Essentially No Barriers in Neural Network Energy Landscape Felix Dräxler K. Veschgini M. Salmhofer Fred Hamprecht MoMe 111 432 0 02 Mar 2018
A Walk with SGD Chen Xing Devansh Arpit Christos Tsirigotis Yoshua Bengio 87 119 0 24 Feb 2018
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 99 995 0 01 Nov 2017
Online Learning Rate Adaptation with Hypergradient Descent A. G. Baydin R. Cornish David Martínez-Rubio Mark Schmidt Frank Wood ODL 74 247 0 14 Mar 2017
SGDR: Stochastic Gradient Descent with Warm Restarts I. Loshchilov Frank Hutter ODL 330 8,116 0 13 Aug 2016
Cyclical Learning Rates for Training Neural Networks L. Smith ODL 197 2,525 0 03 Jun 2015
Probabilistic Line Searches for Stochastic Optimization Maren Mahsereci Philipp Hennig ODL 65 126 0 10 Feb 2015
Qualitatively characterizing neural network optimization problems Ian Goodfellow Oriol Vinyals Andrew M. Saxe ODL 108 522 0 19 Dec 2014