ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.04454
  4. Cited By
Empirical Analysis of the Hessian of Over-Parametrized Neural Networks
v1v2v3 (latest)

Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

14 June 2017
Levent Sagun
Utku Evci
V. U. Güney
Yann N. Dauphin
Léon Bottou
ArXiv (abs)PDFHTML

Papers citing "Empirical Analysis of the Hessian of Over-Parametrized Neural Networks"

35 / 35 papers shown
Title
FP4 All the Way: Fully Quantized Training of LLMs
FP4 All the Way: Fully Quantized Training of LLMs
Brian Chmiel
Maxim Fishman
Ron Banner
Daniel Soudry
MQ
80
0
0
25 May 2025
Accelerating Neural Network Training Along Sharp and Flat Directions
Accelerating Neural Network Training Along Sharp and Flat Directions
Daniyar Zakarin
Sidak Pal Singh
ODL
75
0
0
17 May 2025
Geometry of Learning -- L2 Phase Transitions in Deep and Shallow Neural Networks
Geometry of Learning -- L2 Phase Transitions in Deep and Shallow Neural Networks
Ibrahim Talha Ersoy
Karoline Wiesner
71
0
0
10 May 2025
Training Large Neural Networks With Low-Dimensional Error Feedback
Training Large Neural Networks With Low-Dimensional Error Feedback
Maher Hanut
Jonathan Kadmon
119
1
0
27 Feb 2025
High-dimensional manifold of solutions in neural networks: insights from statistical physics
High-dimensional manifold of solutions in neural networks: insights from statistical physics
Enrico M. Malatesta
105
4
0
20 Feb 2025
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
134
0
0
04 Nov 2024
Estimating the Spectral Moments of the Kernel Integral Operator from Finite Sample Matrices
Estimating the Spectral Moments of the Kernel Integral Operator from Finite Sample Matrices
Chanwoo Chun
SueYeon Chung
Daniel D. Lee
56
1
0
23 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
120
7
0
14 Oct 2024
Does SGD really happen in tiny subspaces?
Does SGD really happen in tiny subspaces?
Minhak Song
Kwangjun Ahn
Chulhee Yun
114
7
1
25 May 2024
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Pingzhi Li
Junyu Liu
Hanrui Wang
Tianlong Chen
179
2
0
30 Apr 2024
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives
Adapting Newton's Method to Neural Networks through a Summary of Higher-Order Derivatives
Pierre Wolinski
ODL
137
0
0
06 Dec 2023
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
Haichao Sha
Ruixuan Liu
Yi-xiao Liu
Hong Chen
118
1
0
06 Dec 2023
Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching
Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching
Tao Feng
Jie Zhang
Peizheng Wang
Zhijie Wang
Shengyuan Pang
DD
134
0
0
29 May 2023
Generalisation under gradient descent via deterministic PAC-Bayes
Generalisation under gradient descent via deterministic PAC-Bayes
Eugenio Clerico
Tyler Farghly
George Deligiannidis
Benjamin Guedj
Arnaud Doucet
116
4
0
06 Sep 2022
MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning
MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning
Siladittya Manna
Umapada Pal
Saumik Bhattacharya
SSL
112
1
0
24 Nov 2021
Beyond Random Matrix Theory for Deep Networks
Beyond Random Matrix Theory for Deep Networks
Diego Granziol
114
16
0
13 Jun 2020
Universal Statistics of Fisher Information in Deep Neural Networks: Mean
  Field Approach
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach
Ryo Karakida
S. Akaho
S. Amari
FedML
174
146
0
04 Jun 2018
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
128
3,688
0
08 Jun 2017
Sharp Minima Can Generalize For Deep Nets
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
143
774
0
15 Mar 2017
Opening the Black Box of Deep Neural Networks via Information
Opening the Black Box of Deep Neural Networks via Information
Ravid Shwartz-Ziv
Naftali Tishby
AI4CE
115
1,419
0
02 Mar 2017
Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond
Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond
Levent Sagun
Léon Bottou
Yann LeCun
UQCV
95
236
0
22 Nov 2016
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
351
4,636
0
10 Nov 2016
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
96
775
0
06 Nov 2016
Topology and Geometry of Half-Rectified Network Optimization
Topology and Geometry of Half-Rectified Network Optimization
C. Freeman
Joan Bruna
222
235
0
04 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
436
2,946
0
15 Sep 2016
The Landscape of Empirical Risk for Non-convex Losses
The Landscape of Empirical Risk for Non-convex Losses
Song Mei
Yu Bai
Andrea Montanari
122
313
0
22 Jul 2016
Unreasonable Effectiveness of Learning Neural Networks: From Accessible
  States and Robust Ensembles to Basic Algorithmic Schemes
Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes
Carlo Baldassi
C. Borgs
J. Chayes
Alessandro Ingrosso
Carlo Lucibello
Luca Saglietti
R. Zecchina
62
168
0
20 May 2016
Gradient Descent Only Converges to Minimizers: Non-Isolated Critical
  Points and Invariant Regions
Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
Ioannis Panageas
Georgios Piliouras
56
142
0
02 May 2016
Gradient Descent Converges to Minimizers
Gradient Descent Converges to Minimizers
Jason D. Lee
Max Simchowitz
Michael I. Jordan
Benjamin Recht
80
212
0
16 Feb 2016
Training Recurrent Neural Networks by Diffusion
Training Recurrent Neural Networks by Diffusion
H. Mobahi
ODL
71
46
0
16 Jan 2016
On Graduated Optimization for Stochastic Non-Convex Problems
On Graduated Optimization for Stochastic Non-Convex Problems
Elad Hazan
Kfir Y. Levy
Shai Shalev-Shwartz
79
117
0
12 Mar 2015
Explorations on high dimensional landscapes
Explorations on high dimensional landscapes
Levent Sagun
V. U. Güney
Gerard Ben Arous
Yann LeCun
84
65
0
20 Dec 2014
New insights and perspectives on the natural gradient method
New insights and perspectives on the natural gradient method
James Martens
ODL
95
631
0
03 Dec 2014
On the principal components of sample covariance matrices
On the principal components of sample covariance matrices
Alex Bloemendal
Antti Knowles
H. Yau
J. Yin
142
152
0
03 Apr 2014
No More Pesky Learning Rates
No More Pesky Learning Rates
Tom Schaul
Sixin Zhang
Yann LeCun
141
478
0
06 Jun 2012
1