Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.12505
Cited By
On the benefits of non-linear weight updates
25 July 2022
Paul Norridge
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"On the benefits of non-linear weight updates"
19 / 19 papers shown
Title
Representation Based Complexity Measures for Predicting Generalization in Deep Learning
Parth Natekar
Manik Sharma
44
36
0
04 Dec 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
192
1,358
0
03 Oct 2020
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
97
168
0
03 Jul 2020
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
139
610
0
04 Dec 2019
Lookahead Optimizer: k steps forward, 1 step back
Michael Ruogu Zhang
James Lucas
Geoffrey E. Hinton
Jimmy Ba
ODL
149
732
0
19 Jul 2019
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Liangchen Luo
Yuanhao Xiong
Yan Liu
Xu Sun
ODL
80
602
0
26 Feb 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Haowei He
Gao Huang
Yang Yuan
ODL
MLT
69
150
0
02 Feb 2019
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
103
996
0
01 Nov 2017
Generalization in Deep Learning
Kenji Kawaguchi
L. Kaelbling
Yoshua Bengio
ODL
91
460
0
16 Oct 2017
Neural Optimizer Search with Reinforcement Learning
Irwan Bello
Barret Zoph
Vijay Vasudevan
Quoc V. Le
ODL
64
386
0
21 Sep 2017
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao
Kashif Rasul
Roland Vollgraf
285
8,920
0
25 Aug 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
86
520
0
23 Aug 2017
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
345
4,636
0
10 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
96
774
0
06 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
429
2,945
0
15 Sep 2016
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
351
8,000
0
23 May 2016
Cyclical Learning Rates for Training Neural Networks
L. Smith
ODL
215
2,537
0
03 Jun 2015
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
Weijie Su
Stephen P. Boyd
Emmanuel J. Candes
165
1,173
0
04 Mar 2015
Striving for Simplicity: The All Convolutional Net
Jost Tobias Springenberg
Alexey Dosovitskiy
Thomas Brox
Martin Riedmiller
FAtt
251
4,681
0
21 Dec 2014
1