Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.07296
Cited By
Block-diagonal Hessian-free Optimization for Training Neural Networks
20 December 2017
Huishuai Zhang
Caiming Xiong
James Bradbury
R. Socher
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Block-diagonal Hessian-free Optimization for Training Neural Networks"
5 / 5 papers shown
Title
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
61
1
0
31 Jan 2025
Debiasing Mini-Batch Quadratics for Applications in Deep Learning
Lukas Tatzel
Bálint Mucsányi
Osane Hackel
Philipp Hennig
43
0
0
18 Oct 2024
Batch Normalization Preconditioning for Neural Network Training
Susanna Lange
Kyle E. Helfrich
Qiang Ye
27
9
0
02 Aug 2021
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization
Neha S. Wadia
Daniel Duckworth
S. Schoenholz
Ethan Dyer
Jascha Narain Sohl-Dickstein
27
13
0
17 Aug 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
284
2,889
0
15 Sep 2016
1