ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.12505
  4. Cited By
On the benefits of non-linear weight updates

On the benefits of non-linear weight updates

25 July 2022
Paul Norridge
ArXiv (abs)PDFHTML

Papers citing "On the benefits of non-linear weight updates"

19 / 19 papers shown
Title
Representation Based Complexity Measures for Predicting Generalization
  in Deep Learning
Representation Based Complexity Measures for Predicting Generalization in Deep Learning
Parth Natekar
Manik Sharma
44
36
0
04 Dec 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
195
1,358
0
03 Oct 2020
Descending through a Crowded Valley - Benchmarking Deep Learning
  Optimizers
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
102
168
0
03 Jul 2020
Fantastic Generalization Measures and Where to Find Them
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
142
610
0
04 Dec 2019
Lookahead Optimizer: k steps forward, 1 step back
Lookahead Optimizer: k steps forward, 1 step back
Michael Ruogu Zhang
James Lucas
Geoffrey E. Hinton
Jimmy Ba
ODL
152
732
0
19 Jul 2019
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Liangchen Luo
Yuanhao Xiong
Yan Liu
Xu Sun
ODL
80
602
0
26 Feb 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Haowei He
Gao Huang
Yang Yuan
ODLMLT
69
150
0
02 Feb 2019
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
103
996
0
01 Nov 2017
Generalization in Deep Learning
Generalization in Deep Learning
Kenji Kawaguchi
L. Kaelbling
Yoshua Bengio
ODL
97
460
0
16 Oct 2017
Neural Optimizer Search with Reinforcement Learning
Neural Optimizer Search with Reinforcement Learning
Irwan Bello
Barret Zoph
Vijay Vasudevan
Quoc V. Le
ODL
64
386
0
21 Sep 2017
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning
  Algorithms
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao
Kashif Rasul
Roland Vollgraf
285
8,920
0
25 Aug 2017
Super-Convergence: Very Fast Training of Neural Networks Using Large
  Learning Rates
Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates
L. Smith
Nicholay Topin
AI4CE
86
520
0
23 Aug 2017
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
348
4,635
0
10 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
96
774
0
06 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
429
2,945
0
15 Sep 2016
Wide Residual Networks
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
351
8,000
0
23 May 2016
Cyclical Learning Rates for Training Neural Networks
Cyclical Learning Rates for Training Neural Networks
L. Smith
ODL
217
2,537
0
03 Jun 2015
A Differential Equation for Modeling Nesterov's Accelerated Gradient
  Method: Theory and Insights
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
Weijie Su
Stephen P. Boyd
Emmanuel J. Candes
165
1,173
0
04 Mar 2015
Striving for Simplicity: The All Convolutional Net
Striving for Simplicity: The All Convolutional Net
Jost Tobias Springenberg
Alexey Dosovitskiy
Thomas Brox
Martin Riedmiller
FAtt
251
4,681
0
21 Dec 2014
1