Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1406.2572
Cited By
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
10 June 2014
Yann N. Dauphin
Razvan Pascanu
Çağlar Gülçehre
Kyunghyun Cho
Surya Ganguli
Yoshua Bengio
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Identifying and attacking the saddle point problem in high-dimensional non-convex optimization"
50 / 213 papers shown
Title
Geometry of Learning -- L2 Phase Transitions in Deep and Shallow Neural Networks
Ibrahim Talha Ersoy
Karoline Wiesner
25
0
0
10 May 2025
High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise
Yuchen Fang
Javad Lavaei
Katya Scheinberg
39
0
0
24 Mar 2025
From Equations to Insights: Unraveling Symbolic Structures in PDEs with LLMs
Rohan Bhatnagar
Ling Liang
Krish Patel
Haizhao Yang
36
0
0
13 Mar 2025
Hamiltonian Neural Networks for Robust Out-of-Time Credit Scoring
Javier Marín
86
0
0
13 Mar 2025
Verification and Validation for Trustworthy Scientific Machine Learning
John D. Jakeman
Lorena A. Barba
J. Martins
Thomas O'Leary-Roseberry
AI4CE
58
0
0
21 Feb 2025
Extended convexity and smoothness and their applications in deep learning
Binchuan Qi
Wei Gong
Li Li
63
0
0
08 Oct 2024
Statistical Mechanics and Artificial Neural Networks: Principles, Models, and Applications
Lucas Böttcher
Gregory R. Wheeler
32
0
0
05 Apr 2024
Beyond Uniform Scaling: Exploring Depth Heterogeneity in Neural Architectures
Akash Guna R.T
Arnav Chavan
Deepak Gupta
MDE
32
0
0
19 Feb 2024
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
Matthias Lehmann
46
0
0
24 Jan 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
82
5
0
22 Jan 2024
GD doesn't make the cut: Three ways that non-differentiability affects neural network training
Siddharth Krishna Kumar
AAML
23
2
0
16 Jan 2024
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
Kai Lv
Hang Yan
Qipeng Guo
Haijun Lv
Xipeng Qiu
ODL
27
20
0
16 Oct 2023
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
Gerard Ben Arous
Reza Gheissari
Jiaoyang Huang
Aukosh Jagannath
35
14
0
04 Oct 2023
Fading memory as inductive bias in residual recurrent networks
I. Dubinin
Felix Effenberger
43
4
0
27 Jul 2023
The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions
Nishil Patel
Sebastian Lee
Stefano Sarao Mannelli
Sebastian Goldt
Adrew Saxe
OffRL
30
3
0
17 Jun 2023
Machine learning with tree tensor networks, CP rank constraints, and tensor dropout
Hao Chen
T. Barthel
48
7
0
30 May 2023
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
Haque Ishfaq
Qingfeng Lan
Pan Xu
A. R. Mahmood
Doina Precup
Anima Anandkumar
Kamyar Azizzadenesheli
BDL
OffRL
28
20
0
29 May 2023
Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function
Linxuan Pan
Shenghui Song
FedML
25
2
0
24 May 2023
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
Kazuki Osawa
Satoki Ishikawa
Rio Yokota
Shigang Li
Torsten Hoefler
ODL
40
14
0
08 May 2023
The R-mAtrIx Net
Shailesh Lal
Suvajit Majumder
E. Sobko
24
5
0
14 Apr 2023
Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent
Liu Ziyin
Botao Li
Tomer Galanti
Masakuni Ueda
37
7
0
23 Mar 2023
Complex Clipping for Improved Generalization in Machine Learning
L. Atlas
Nicholas Rasmussen
Felix Schwock
Mert Pilanci
20
0
0
27 Feb 2023
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
20
7
0
03 Feb 2023
Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning
Athul Shibu
Abhishek Kumar
Heechul Jung
Dong-Gyu Lee
17
1
0
26 Jan 2023
Exploring Complex Dynamical Systems via Nonconvex Optimization
Hunter L. Elliott
18
0
0
03 Jan 2023
Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
R. Venkatesh Babu
31
29
0
28 Dec 2022
Langevin algorithms for very deep Neural Networks with application to image classification
Pierre Bras
20
6
0
27 Dec 2022
Langevin algorithms for Markovian Neural Networks and Deep Stochastic control
Pierre Bras
Gilles Pagès
22
3
0
22 Dec 2022
Scalable Bayesian Uncertainty Quantification for Neural Network Potentials: Promise and Pitfalls
Stephan Thaler
Gregor Doehner
Julija Zavadlav
35
21
0
15 Dec 2022
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast Evasion of Non-Degenerate Saddle Points
Mayank Baranwal
Param Budhraja
V. Raj
A. Hota
33
2
0
07 Dec 2022
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
31
6
0
05 Dec 2022
A survey of deep learning optimizers -- first and second order methods
Rohan Kashyap
ODL
37
6
0
28 Nov 2022
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDL
MLT
AI4CE
29
51
0
24 Nov 2022
Escaping From Saddle Points Using Asynchronous Coordinate Gradient Descent
Marco Bornstein
Jin-Peng Liu
Jingling Li
Furong Huang
21
0
0
17 Nov 2022
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
Etienne Boursier
Loucas Pillaud-Vivien
Nicolas Flammarion
ODL
27
58
0
02 Jun 2022
Decoupling multivariate functions using a nonparametric filtered tensor decomposition
J. Decuyper
K. Tiels
S. Weiland
M. Runacres
J. Schoukens
24
3
0
23 May 2022
Training neural networks using Metropolis Monte Carlo and an adaptive variant
S. Whitelam
V. Selin
Ian Benlolo
Corneel Casert
Isaac Tamblyn
BDL
13
7
0
16 May 2022
Gradient Descent, Stochastic Optimization, and Other Tales
Jun Lu
16
8
0
02 May 2022
FuNNscope: Visual microscope for interactively exploring the loss landscape of fully connected neural networks
Aleksandar Doknic
Torsten Moller
36
2
0
09 Apr 2022
Random matrix analysis of deep neural network weight matrices
M. Thamm
Max Staats
B. Rosenow
35
12
0
28 Mar 2022
Myriad: a real-world testbed to bridge trajectory optimization and deep learning
Nikolaus H. R. Howe
Simon Dufort-Labbé
Nitarshan Rajkumar
Pierre-Luc Bacon
32
5
0
22 Feb 2022
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
47
466
0
14 Feb 2022
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
24
58
0
01 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
9
0
31 Jan 2022
Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization
Frederik Benzing
ODL
43
23
0
28 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
25
34
0
20 Jan 2022
Forecasting Brain Activity Based on Models of Spatio-Temporal Brain Dynamics: A Comparison of Graph Neural Network Architectures
S. Wein
Alina Schüller
A. Tomé
W. Malloni
M. Greenlee
E. Lang
AI4CE
38
14
0
08 Dec 2021
Boosting Unsupervised Domain Adaptation with Soft Pseudo-label and Curriculum Learning
Shengjia Zhang
Tiancheng Lin
Yi Tian Xu
30
5
0
03 Dec 2021
Escape saddle points by a simple gradient-descent based algorithm
Chenyi Zhang
Tongyang Li
ODL
31
15
0
28 Nov 2021
NCVX: A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning
Buyun Liang
Tim Mitchell
Ju Sun
17
3
0
27 Nov 2021
1
2
3
4
5
Next