Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.04454
Cited By
v1
v2
v3 (latest)
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
14 June 2017
Levent Sagun
Utku Evci
V. U. Güney
Yann N. Dauphin
Léon Bottou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Empirical Analysis of the Hessian of Over-Parametrized Neural Networks"
35 / 35 papers shown
Title
FP4 All the Way: Fully Quantized Training of LLMs
Brian Chmiel
Maxim Fishman
Ron Banner
Daniel Soudry
MQ
80
0
0
25 May 2025
Accelerating Neural Network Training Along Sharp and Flat Directions
Daniyar Zakarin
Sidak Pal Singh
ODL
75
0
0
17 May 2025
Geometry of Learning -- L2 Phase Transitions in Deep and Shallow Neural Networks
Ibrahim Talha Ersoy
Karoline Wiesner
71
0
0
10 May 2025
Training Large Neural Networks With Low-Dimensional Error Feedback
Maher Hanut
Jonathan Kadmon
119
1
0
27 Feb 2025
High-dimensional manifold of solutions in neural networks: insights from statistical physics
Enrico M. Malatesta
105
4
0
20 Feb 2025
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
134
0
0
04 Nov 2024
Estimating the Spectral Moments of the Kernel Integral Operator from Finite Sample Matrices
Chanwoo Chun
SueYeon Chung
Daniel D. Lee
56
1
0
23 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
120
7
0
14 Oct 2024
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
112
7
1
25 May 2024
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Pingzhi Li
Junyu Liu
Hanrui Wang
Tianlong Chen
179
2
0
30 Apr 2024
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives
Pierre Wolinski
ODL
137
0
0
06 Dec 2023
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
Haichao Sha
Ruixuan Liu
Yi-xiao Liu
Hong Chen
118
1
0
06 Dec 2023
Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching
Tao Feng
Jie Zhang
Peizheng Wang
Zhijie Wang
Shengyuan Pang
DD
134
0
0
29 May 2023
Generalisation under gradient descent via deterministic PAC-Bayes
Eugenio Clerico
Tyler Farghly
George Deligiannidis
Benjamin Guedj
Arnaud Doucet
116
4
0
06 Sep 2022
MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning
Siladittya Manna
Umapada Pal
Saumik Bhattacharya
SSL
112
1
0
24 Nov 2021
Beyond Random Matrix Theory for Deep Networks
Diego Granziol
114
16
0
13 Jun 2020
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach
Ryo Karakida
S. Akaho
S. Amari
FedML
174
146
0
04 Jun 2018
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
128
3,685
0
08 Jun 2017
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
143
774
0
15 Mar 2017
Opening the Black Box of Deep Neural Networks via Information
Ravid Shwartz-Ziv
Naftali Tishby
AI4CE
115
1,419
0
02 Mar 2017
Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond
Levent Sagun
Léon Bottou
Yann LeCun
UQCV
95
236
0
22 Nov 2016
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
351
4,636
0
10 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
96
775
0
06 Nov 2016
Topology and Geometry of Half-Rectified Network Optimization
C. Freeman
Joan Bruna
222
235
0
04 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
436
2,946
0
15 Sep 2016
The Landscape of Empirical Risk for Non-convex Losses
Song Mei
Yu Bai
Andrea Montanari
122
313
0
22 Jul 2016
Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes
Carlo Baldassi
C. Borgs
J. Chayes
Alessandro Ingrosso
Carlo Lucibello
Luca Saglietti
R. Zecchina
62
168
0
20 May 2016
Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
Ioannis Panageas
Georgios Piliouras
56
142
0
02 May 2016
Gradient Descent Converges to Minimizers
Jason D. Lee
Max Simchowitz
Michael I. Jordan
Benjamin Recht
80
212
0
16 Feb 2016
Training Recurrent Neural Networks by Diffusion
H. Mobahi
ODL
71
46
0
16 Jan 2016
On Graduated Optimization for Stochastic Non-Convex Problems
Elad Hazan
Kfir Y. Levy
Shai Shalev-Shwartz
79
117
0
12 Mar 2015
Explorations on high dimensional landscapes
Levent Sagun
V. U. Güney
Gerard Ben Arous
Yann LeCun
84
65
0
20 Dec 2014
New insights and perspectives on the natural gradient method
James Martens
ODL
95
631
0
03 Dec 2014
On the principal components of sample covariance matrices
Alex Bloemendal
Antti Knowles
H. Yau
J. Yin
142
152
0
03 Apr 2014
No More Pesky Learning Rates
Tom Schaul
Sixin Zhang
Yann LeCun
141
478
0
06 Jun 2012
1