When Does Preconditioning Help or Hurt Generalization?

When Does Preconditioning Help or Hurt Generalization?

18 June 2020

Jimmy Ba

Roger C. Grosse

Atsushi Nitanda

Papers citing "When Does Preconditioning Help or Hurt Generalization?"

10 / 10 papers shown

Title
Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods A. Ma Yangchen Pan Amir-massoud Farahmand AAML 25 5 0 13 Aug 2023
Meta-Learning with a Geometry-Adaptive Preconditioner Suhyun Kang Duhun Hwang Moonjung Eo Taesup Kim Wonjong Rhee AI4CE 22 15 0 04 Apr 2023
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions Vladimir Feinberg Xinyi Chen Y. Jennifer Sun Rohan Anil Elad Hazan 21 12 0 07 Feb 2023
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning Lin Zhang S. Shi Wei Wang Bo-wen Li 28 10 0 30 Jun 2022
Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent Yiping Lu Jose H. Blanchet Lexing Ying 30 7 0 15 May 2022
Multi-scale Feature Learning Dynamics: Insights for Double Descent Mohammad Pezeshki Amartya Mitra Yoshua Bengio Guillaume Lajoie 55 25 0 06 Dec 2021
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization Neha S. Wadia Daniel Duckworth S. Schoenholz Ethan Dyer Jascha Narain Sohl-Dickstein 19 13 0 17 Aug 2020
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth Yiping Lu Chao Ma Yulong Lu Jianfeng Lu Lexing Ying MLT 31 78 0 11 Mar 2020
Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime Stéphane dÁscoli Maria Refinetti Giulio Biroli Florent Krzakala 88 152 0 02 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 278 2,888 0 15 Sep 2016