Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

22 November 2016

Papers citing "Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond"

50 / 57 papers shown

Title
Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home Viktor Moskvoretskii M. Lysyuk Mikhail Salnikov Nikolay Ivanov Sergey Pletenev Daria Galimzianova Nikita Krayko Vasily Konovalov Irina Nikishina Alexander Panchenko RALM 76 4 0 24 Feb 2025
High-dimensional manifold of solutions in neural networks: insights from statistical physics Enrico M. Malatesta 51 4 0 20 Feb 2025
Position: Curvature Matrices Should Be Democratized via Linear Operators Felix Dangel Runa Eschenhagen Weronika Ormaniec Andres Fernandez Lukas Tatzel Agustinus Kristiadi 58 3 0 31 Jan 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning Rémy Hosseinkhan Boucher Onofrio Semeraro L. Mathelin 82 0 0 28 Jan 2025
FOCUS: First Order Concentrated Updating Scheme Yizhou Liu Ziming Liu Jeff Gore ODL 108 1 0 21 Jan 2025
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis Zhijie Chen Qiaobo Li A. Banerjee FedML 37 0 0 11 Nov 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks Jim Zhao Sidak Pal Singh Aurelien Lucchi AI4CE 48 0 0 04 Nov 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis Weronika Ormaniec Felix Dangel Sidak Pal Singh 38 7 0 14 Oct 2024
Nesterov acceleration in benignly non-convex landscapes Kanan Gupta Stephan Wojtowytsch 39 2 0 10 Oct 2024
Unraveling the Hessian: A Key to Smooth Convergence in Loss Function Landscapes Nikita Kiselev Andrey Grabovoy 54 1 0 18 Sep 2024
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 71 5 1 25 May 2024
Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization S. Reifenstein T. Leleu Yoshihisa Yamamoto 48 1 0 02 May 2024
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer Yanjun Zhao Sizhe Dang Haishan Ye Guang Dai Yi Qian Ivor W.Tsang 66 8 0 23 Feb 2024
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks Gerard Ben Arous Reza Gheissari Jiaoyang Huang Aukosh Jagannath 35 14 0 04 Oct 2023
Accelerating Distributed ML Training via Selective Synchronization S. Tyagi Martin Swany FedML 32 3 0 16 Jul 2023
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training Hong Liu Zhiyuan Li David Leo Wright Hall Percy Liang Tengyu Ma VLM 55 130 0 23 May 2023
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training S. Tyagi Martin Swany 25 4 0 20 May 2023
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions Vladimir Feinberg Xinyi Chen Y. Jennifer Sun Rohan Anil Elad Hazan 29 12 0 07 Feb 2023
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients Guihong Li Yuedong Yang Kartikeya Bhardwaj R. Marculescu 36 60 0 26 Jan 2023
On the Overlooked Structure of Stochastic Gradients Zeke Xie Qian-Yuan Tang Mingming Sun P. Li 31 6 0 05 Dec 2022
Noise Injection as a Probe of Deep Learning Dynamics Noam Levi I. Bloch M. Freytsis T. Volansky 40 2 0 24 Oct 2022
Precision Machine Learning Eric J. Michaud Ziming Liu Max Tegmark 24 34 0 24 Oct 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability Z. Li Zixuan Wang Jian Li 19 42 0 26 Jul 2022
Laplacian Autoencoders for Learning Stochastic Representations M. Miani Frederik Warburg Pablo Moreno-Muñoz Nicke Skafte Detlefsen Søren Hauberg UQCV BDL SSL 35 10 0 30 Jun 2022
Neural Collapse: A Review on Modelling Principles and Generalization Vignesh Kothapalli 25 74 0 08 Jun 2022
Recycling Model Updates in Federated Learning: Are Gradient Subspaces Low-Rank? Sheikh Shams Azam Seyyedali Hosseinalipour Qiang Qiu Christopher G. Brinton FedML 26 20 0 01 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning Zeke Xie Qian-Yuan Tang Yunfeng Cai Mingming Sun P. Li ODL 42 9 0 31 Jan 2022
Eigenvalues of Autoencoders in Training and at Initialization Ben Dees S. Agarwala Corey Lowman 24 0 0 27 Jan 2022
Impact of classification difficulty on the weight matrices spectra in Deep Learning and application to early-stopping Xuran Meng Jianfeng Yao 25 7 0 26 Nov 2021
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II Yossi Arjevani M. Field 28 18 0 21 Jul 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion D. Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel L. K. Yamins 31 15 0 19 Jul 2021
Deep Learning Through the Lens of Example Difficulty R. Baldock Hartmut Maennel Behnam Neyshabur 47 156 0 17 Jun 2021
Appearance of Random Matrix Theory in Deep Learning Nicholas P. Baskerville Diego Granziol J. Keating 15 11 0 12 Feb 2021
Combating Mode Collapse in GAN training: An Empirical Analysis using Hessian Eigenvalues Ricard Durall Avraam Chatzimichailidis P. Labus J. Keuper GAN 30 57 0 17 Dec 2020
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization Adepu Ravi Sankar Yash Khasbage Rahul Vigneswaran V. Balasubramanian 25 42 0 07 Dec 2020
A Random Matrix Theory Approach to Damping in Deep Learning Diego Granziol Nicholas P. Baskerville AI4CE ODL 29 2 0 15 Nov 2020
PEP: Parameter Ensembling by Perturbation Alireza Mehrtash Purang Abolmaesumi Polina Golland Tina Kapur Demian Wassermann W. Wells 25 10 0 24 Oct 2020
Prevalence of Neural Collapse during the terminal phase of deep learning training Vardan Papyan Xuemei Han D. Donoho 29 549 0 18 Aug 2020
An analytic theory of shallow networks dynamics for hinge loss classification Franco Pellegrini Giulio Biroli 35 19 0 19 Jun 2020
Directional Pruning of Deep Neural Networks Shih-Kang Chao Zhanyu Wang Yue Xing Guang Cheng ODL 21 33 0 16 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training Diego Granziol S. Zohren Stephen J. Roberts ODL 37 49 0 16 Jun 2020
Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization Thomas O'Leary-Roseberry Nick Alger Omar Ghattas ODL 37 9 0 07 Feb 2020
Understanding and mitigating gradient pathologies in physics-informed neural networks Sizhuang He Yujun Teng P. Perdikaris AI4CE PINN 35 291 0 13 Jan 2020
Optimization for deep learning: theory and algorithms Ruoyu Sun ODL 19 168 0 19 Dec 2019
Geometry of learning neural quantum states Chae-Yeun Park M. Kastoryano 24 60 0 24 Oct 2019
Asymptotics of Wide Networks from Feynman Diagrams Ethan Dyer Guy Gur-Ari 26 113 0 25 Sep 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization Xinyan Li Qilong Gu Yingxue Zhou Tiancong Chen A. Banerjee ODL 42 51 0 24 Jul 2019
Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape Johanni Brea Berfin Simsek Bernd Illing W. Gerstner 23 55 0 05 Jul 2019
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians Vardan Papyan 22 87 0 24 Jan 2019
Gradient Descent Happens in a Tiny Subspace Guy Gur-Ari Daniel A. Roberts Ethan Dyer 28 228 0 12 Dec 2018