Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems

15 May 2020

Papers citing "Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems"

8 / 8 papers shown

Title
Unifying Global and Local Scene Entities Modelling for Precise Action Spotting Kim Hoang Tran Phuc Vuong Do Ngoc Quoc Ly Ngan Le 36 4 0 15 Apr 2024
Learning threshold neurons via the "edge of stability" Kwangjun Ahn Sébastien Bubeck Sinho Chewi Y. Lee Felipe Suarez Yi Zhang MLT 38 36 0 14 Dec 2022
SGD with Large Step Sizes Learns Sparse Features Maksym Andriushchenko Aditya Varre Loucas Pillaud-Vivien Nicolas Flammarion 45 56 0 11 Oct 2022
On the Benefits of Large Learning Rates for Kernel Methods Gaspard Beugnot Julien Mairal Alessandro Rudi 13 11 0 28 Feb 2022
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 57 40 0 07 Oct 2021
How to decay your learning rate Aitor Lewkowycz 38 24 0 23 Mar 2021
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 159 234 0 04 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 308 2,890 0 15 Sep 2016