On the benefits of non-linear weight updates

25 July 2022

Papers citing "On the benefits of non-linear weight updates"

19 / 19 papers shown

Title
Representation Based Complexity Measures for Predicting Generalization in Deep Learning Parth Natekar Manik Sharma 44 36 0 04 Dec 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization Pierre Foret Ariel Kleiner H. Mobahi Behnam Neyshabur AAML 195 1,358 0 03 Oct 2020
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers Robin M. Schmidt Frank Schneider Philipp Hennig ODL 102 168 0 03 Jul 2020
Fantastic Generalization Measures and Where to Find Them Yiding Jiang Behnam Neyshabur H. Mobahi Dilip Krishnan Samy Bengio AI4CE 142 610 0 04 Dec 2019
Lookahead Optimizer: k steps forward, 1 step back Michael Ruogu Zhang James Lucas Geoffrey E. Hinton Jimmy Ba ODL 152 732 0 19 Jul 2019
Adaptive Gradient Methods with Dynamic Bound of Learning Rate Liangchen Luo Yuanhao Xiong Yan Liu Xu Sun ODL 80 602 0 26 Feb 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima Haowei He Gao Huang Yang Yuan ODL MLT 69 150 0 02 Feb 2019
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 103 996 0 01 Nov 2017
Generalization in Deep Learning Kenji Kawaguchi L. Kaelbling Yoshua Bengio ODL 97 460 0 16 Oct 2017
Neural Optimizer Search with Reinforcement Learning Irwan Bello Barret Zoph Vijay Vasudevan Quoc V. Le ODL 64 386 0 21 Sep 2017
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms Han Xiao Kashif Rasul Roland Vollgraf 285 8,920 0 25 Aug 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates L. Smith Nicholay Topin AI4CE 86 520 0 23 Aug 2017
Understanding deep learning requires rethinking generalization Chiyuan Zhang Samy Bengio Moritz Hardt Benjamin Recht Oriol Vinyals HAI 348 4,635 0 10 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys Pratik Chaudhari A. Choromańska Stefano Soatto Yann LeCun Carlo Baldassi C. Borgs J. Chayes Levent Sagun R. Zecchina ODL 96 774 0 06 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 429 2,945 0 15 Sep 2016
Wide Residual Networks Sergey Zagoruyko N. Komodakis 351 8,000 0 23 May 2016
Cyclical Learning Rates for Training Neural Networks L. Smith ODL 217 2,537 0 03 Jun 2015
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights Weijie Su Stephen P. Boyd Emmanuel J. Candes 165 1,173 0 04 Mar 2015
Striving for Simplicity: The All Convolutional Net Jost Tobias Springenberg Alexey Dosovitskiy Thomas Brox Martin Riedmiller FAtt 251 4,681 0 21 Dec 2014