On the interplay between noise and curvature and its effect on optimization and generalization

18 June 2019

Valentin Thomas

Fabian Pedregosa

B. V. Merrienboer

Pierre-Antoine Mangazol

Yoshua Bengio

Nicolas Le Roux

ArXiv PDF HTML

Papers citing "On the interplay between noise and curvature and its effect on optimization and generalization"

22 / 22 papers shown

Title
An Improved Empirical Fisher Approximation for Natural Gradient Descent Xiaodong Wu Wenyi Yu Chao Zhang Philip Woodland 36 3 0 10 Jun 2024
Fading memory as inductive bias in residual recurrent networks I. Dubinin Felix Effenberger 45 4 0 27 Jul 2023
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances Marcel Kühn B. Rosenow 37 3 0 08 Jun 2023
Fast as CHITA: Neural Network Pruning with Combinatorial Optimization Riade Benbaki Wenyu Chen X. Meng Hussein Hazimeh Natalia Ponomareva Zhe Zhao Rahul Mazumder 21 26 0 28 Feb 2023
On the Lipschitz Constant of Deep Networks and Double Descent Matteo Gamba Hossein Azizpour Mårten Björkman 33 7 0 28 Jan 2023
How to select an objective function using information theory T. Hodson T. Over T. Smith Lucy M. Marshall 21 1 0 10 Dec 2022
Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States Ziqiao Wang Yongyi Mao 35 10 0 19 Nov 2022
Noise Injection as a Probe of Deep Learning Dynamics Noam Levi I. Bloch M. Freytsis T. Volansky 42 2 0 24 Oct 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning Lin Zhang Shaoshuai Shi Wei Wang Bo Li 38 10 0 30 Jun 2022
Hybrid quantum ResNet for car classification and its hyperparameter optimization Asel Sagingalieva Mohammad Kordzanganeh Andrii Kurkin Artem Melnikov Daniil Kuhmistrov M. Perelshtein A. Melnikov Andrea Skolik David Von Dollen 66 36 0 10 May 2022
On the Power-Law Hessian Spectrums in Deep Learning Zeke Xie Qian-Yuan Tang Yunfeng Cai Mingming Sun P. Li ODL 42 9 0 31 Jan 2022
A generalization gap estimation for overparameterized models via the Langevin functional variance Akifumi Okuno Keisuke Yano 50 1 0 07 Dec 2021
Fishr: Invariant Gradient Variances for Out-of-Distribution Generalization Alexandre Ramé Corentin Dancette Matthieu Cord OOD 50 206 0 07 Sep 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion D. Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel L. K. Yamins 36 16 0 19 Jul 2021
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks Shaoshuai Shi Lin Zhang Bo Li 45 9 0 14 Jul 2021
M-FAC: Efficient Matrix-Free Approximations of Second-Order Information Elias Frantar Eldar Kurtic Dan Alistarh 18 57 0 07 Jul 2021
Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics Charles H. Martin Michael W. Mahoney 25 19 0 01 Jun 2021
Deep Learning is Singular, and That's Good Daniel Murfet Susan Wei Biwei Huang Hui Li Jesse Gell-Redman T. Quella UQCV 26 26 0 22 Oct 2020
When Does Preconditioning Help or Hurt Generalization? S. Amari Jimmy Ba Roger C. Grosse Xuechen Li Atsushi Nitanda Taiji Suzuki Denny Wu Ji Xu 36 32 0 18 Jun 2020
Scalable and Practical Natural Gradient for Large-Scale Deep Learning Kazuki Osawa Yohei Tsuji Yuichiro Ueno Akira Naruse Chuan-Sheng Foo Rio Yokota 39 36 0 13 Feb 2020
Information-Theoretic Generalization Bounds for SGLD via Data-Dependent Estimates Jeffrey Negrea Mahdi Haghifam Gintare Karolina Dziugaite Ashish Khisti Daniel M. Roy FedML 115 148 0 06 Nov 2019
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 312 2,896 0 15 Sep 2016