v1v2v3 (latest)

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

14 June 2017

Papers citing "Empirical Analysis of the Hessian of Over-Parametrized Neural Networks"

35 / 35 papers shown

Title
FP4 All the Way: Fully Quantized Training of LLMs Brian Chmiel Maxim Fishman Ron Banner Daniel Soudry MQ 80 0 0 25 May 2025
Accelerating Neural Network Training Along Sharp and Flat Directions Daniyar Zakarin Sidak Pal Singh ODL 75 0 0 17 May 2025
Geometry of Learning -- L2 Phase Transitions in Deep and Shallow Neural Networks Ibrahim Talha Ersoy Karoline Wiesner 71 0 0 10 May 2025
Training Large Neural Networks With Low-Dimensional Error Feedback Maher Hanut Jonathan Kadmon 119 1 0 27 Feb 2025
High-dimensional manifold of solutions in neural networks: insights from statistical physics Enrico M. Malatesta 105 4 0 20 Feb 2025
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks Jim Zhao Sidak Pal Singh Aurelien Lucchi AI4CE 134 0 0 04 Nov 2024
Estimating the Spectral Moments of the Kernel Integral Operator from Finite Sample Matrices Chanwoo Chun SueYeon Chung Daniel D. Lee 56 1 0 23 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis Weronika Ormaniec Felix Dangel Sidak Pal Singh 120 7 0 14 Oct 2024
Does SGD really happen in tiny subspaces? Minhak Song Kwangjun Ahn Chulhee Yun 114 7 1 25 May 2024
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent Pingzhi Li Junyu Liu Hanrui Wang Tianlong Chen 179 2 0 30 Apr 2024
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives Pierre Wolinski ODL 137 0 0 06 Dec 2023
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance Haichao Sha Ruixuan Liu Yi-xiao Liu Hong Chen 118 1 0 06 Dec 2023
Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching Tao Feng Jie Zhang Peizheng Wang Zhijie Wang Shengyuan Pang DD 134 0 0 29 May 2023
Generalisation under gradient descent via deterministic PAC-Bayes Eugenio Clerico Tyler Farghly George Deligiannidis Benjamin Guedj Arnaud Doucet 116 4 0 06 Sep 2022
MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning Siladittya Manna Umapada Pal Saumik Bhattacharya SSL 112 1 0 24 Nov 2021
Beyond Random Matrix Theory for Deep Networks Diego Granziol 114 16 0 13 Jun 2020
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach Ryo Karakida S. Akaho S. Amari FedML 174 146 0 04 Jun 2018
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Priya Goyal Piotr Dollár Ross B. Girshick P. Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia Kaiming He 3DH 128 3,688 0 08 Jun 2017
Sharp Minima Can Generalize For Deep Nets Laurent Dinh Razvan Pascanu Samy Bengio Yoshua Bengio ODL 143 774 0 15 Mar 2017
Opening the Black Box of Deep Neural Networks via Information Ravid Shwartz-Ziv Naftali Tishby AI4CE 115 1,419 0 02 Mar 2017
Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond Levent Sagun Léon Bottou Yann LeCun UQCV 95 236 0 22 Nov 2016
Understanding deep learning requires rethinking generalization Chiyuan Zhang Samy Bengio Moritz Hardt Benjamin Recht Oriol Vinyals HAI 351 4,636 0 10 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys Pratik Chaudhari A. Choromańska Stefano Soatto Yann LeCun Carlo Baldassi C. Borgs J. Chayes Levent Sagun R. Zecchina ODL 96 775 0 06 Nov 2016
Topology and Geometry of Half-Rectified Network Optimization C. Freeman Joan Bruna 222 235 0 04 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 436 2,946 0 15 Sep 2016
The Landscape of Empirical Risk for Non-convex Losses Song Mei Yu Bai Andrea Montanari 122 313 0 22 Jul 2016
Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes Carlo Baldassi C. Borgs J. Chayes Alessandro Ingrosso Carlo Lucibello Luca Saglietti R. Zecchina 62 168 0 20 May 2016
Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions Ioannis Panageas Georgios Piliouras 56 142 0 02 May 2016
Gradient Descent Converges to Minimizers Jason D. Lee Max Simchowitz Michael I. Jordan Benjamin Recht 80 212 0 16 Feb 2016
Training Recurrent Neural Networks by Diffusion H. Mobahi ODL 71 46 0 16 Jan 2016
On Graduated Optimization for Stochastic Non-Convex Problems Elad Hazan Kfir Y. Levy Shai Shalev-Shwartz 79 117 0 12 Mar 2015
Explorations on high dimensional landscapes Levent Sagun V. U. Güney Gerard Ben Arous Yann LeCun 84 65 0 20 Dec 2014
New insights and perspectives on the natural gradient method James Martens ODL 95 631 0 03 Dec 2014
On the principal components of sample covariance matrices Alex Bloemendal Antti Knowles H. Yau J. Yin 142 152 0 03 Apr 2014
No More Pesky Learning Rates Tom Schaul Sixin Zhang Yann LeCun 141 478 0 06 Jun 2012