Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks

16 October 2018

Papers citing "Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks"

23 / 23 papers shown

Title
Three Factors Influencing Minima in SGD Stanislaw Jastrzebski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey 76 463 0 13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 95 994 0 01 Nov 2017
Adaptive Sampling Strategies for Stochastic Optimization Raghu Bollapragada R. Byrd J. Nocedal 44 116 0 30 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent Samuel L. Smith Quoc V. Le BDL 61 251 0 17 Oct 2017
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks Levent Sagun Utku Evci V. U. Güney Yann N. Dauphin Léon Bottou 54 418 0 14 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Priya Goyal Piotr Dollár Ross B. Girshick P. Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia Kaiming He 3DH 120 3,675 0 08 Jun 2017
The Loss Surface of Residual Networks: Ensembles and the Role of Batch Normalization Etai Littwin Lior Wolf UQCV 111 15 0 08 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys Pratik Chaudhari A. Choromańska Stefano Soatto Yann LeCun Carlo Baldassi C. Borgs J. Chayes Levent Sagun R. Zecchina ODL 94 773 0 06 Nov 2016
Big Batch SGD: Automated Inference using Adaptive Batch Sizes Soham De A. Yadav David Jacobs Tom Goldstein ODL 125 62 0 18 Oct 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 394 2,934 0 15 Sep 2016
Densely Connected Convolutional Networks Gao Huang Zhuang Liu Laurens van der Maaten Kilian Q. Weinberger PINN 3DV 711 36,708 0 25 Aug 2016
Optimization Methods for Large-Scale Machine Learning Léon Bottou Frank E. Curtis J. Nocedal 211 3,202 0 15 Jun 2016
No bad local minima: Data independent training error guarantees for multilayer neural networks Daniel Soudry Y. Carmon 155 235 0 26 May 2016
Deep Networks with Stochastic Depth Gao Huang Yu Sun Zhuang Liu Daniel Sedra Kilian Q. Weinberger 187 2,352 0 30 Mar 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 1.9K 193,426 0 10 Dec 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe Christian Szegedy OOD 413 43,234 0 11 Feb 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.4K 149,842 0 22 Dec 2014
Qualitatively characterizing neural network optimization problems Ian Goodfellow Oriol Vinyals Andrew M. Saxe ODL 105 522 0 19 Dec 2014
New insights and perspectives on the natural gradient method James Martens ODL 66 619 0 03 Dec 2014
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 1.4K 39,472 0 01 Sep 2014
Training Neural Networks with Stochastic Hessian-Free Optimization Ryan Kiros BDL 82 48 0 16 Jan 2013
ADADELTA: An Adaptive Learning Rate Method Matthew D. Zeiler ODL 132 6,623 0 22 Dec 2012
Hybrid Deterministic-Stochastic Methods for Data Fitting M. Friedlander Mark Schmidt 171 387 0 13 Apr 2011