ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1406.2572
  4. Cited By
Identifying and attacking the saddle point problem in high-dimensional
  non-convex optimization

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

10 June 2014
Yann N. Dauphin
Razvan Pascanu
Çağlar Gülçehre
Kyunghyun Cho
Surya Ganguli
Yoshua Bengio
    ODL
ArXivPDFHTML

Papers citing "Identifying and attacking the saddle point problem in high-dimensional non-convex optimization"

50 / 213 papers shown
Title
Geometry of Learning -- L2 Phase Transitions in Deep and Shallow Neural Networks
Geometry of Learning -- L2 Phase Transitions in Deep and Shallow Neural Networks
Ibrahim Talha Ersoy
Karoline Wiesner
25
0
0
10 May 2025
High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise
High Probability Complexity Bounds of Trust-Region Stochastic Sequential Quadratic Programming with Heavy-Tailed Noise
Yuchen Fang
Javad Lavaei
Katya Scheinberg
39
0
0
24 Mar 2025
From Equations to Insights: Unraveling Symbolic Structures in PDEs with LLMs
From Equations to Insights: Unraveling Symbolic Structures in PDEs with LLMs
Rohan Bhatnagar
Ling Liang
Krish Patel
Haizhao Yang
36
0
0
13 Mar 2025
Hamiltonian Neural Networks for Robust Out-of-Time Credit Scoring
Hamiltonian Neural Networks for Robust Out-of-Time Credit Scoring
Javier Marín
86
0
0
13 Mar 2025
Verification and Validation for Trustworthy Scientific Machine Learning
Verification and Validation for Trustworthy Scientific Machine Learning
John D. Jakeman
Lorena A. Barba
J. Martins
Thomas O'Leary-Roseberry
AI4CE
58
0
0
21 Feb 2025
Extended convexity and smoothness and their applications in deep learning
Extended convexity and smoothness and their applications in deep learning
Binchuan Qi
Wei Gong
Li Li
63
0
0
08 Oct 2024
Statistical Mechanics and Artificial Neural Networks: Principles,
  Models, and Applications
Statistical Mechanics and Artificial Neural Networks: Principles, Models, and Applications
Lucas Böttcher
Gregory R. Wheeler
32
0
0
05 Apr 2024
Beyond Uniform Scaling: Exploring Depth Heterogeneity in Neural
  Architectures
Beyond Uniform Scaling: Exploring Depth Heterogeneity in Neural Architectures
Akash Guna R.T
Arnav Chavan
Deepak Gupta
MDE
32
0
0
19 Feb 2024
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning:
  Theory, Algorithms and Implementations
The Definitive Guide to Policy Gradients in Deep Reinforcement Learning: Theory, Algorithms and Implementations
Matthias Lehmann
46
0
0
24 Jan 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
82
5
0
22 Jan 2024
GD doesn't make the cut: Three ways that non-differentiability affects
  neural network training
GD doesn't make the cut: Three ways that non-differentiability affects neural network training
Siddharth Krishna Kumar
AAML
23
2
0
16 Jan 2024
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
Kai Lv
Hang Yan
Qipeng Guo
Haijun Lv
Xipeng Qiu
ODL
27
20
0
16 Oct 2023
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
Spectral alignment of stochastic gradient descent for high-dimensional classification tasks
Gerard Ben Arous
Reza Gheissari
Jiaoyang Huang
Aukosh Jagannath
35
14
0
04 Oct 2023
Fading memory as inductive bias in residual recurrent networks
Fading memory as inductive bias in residual recurrent networks
I. Dubinin
Felix Effenberger
43
4
0
27 Jul 2023
The RL Perceptron: Generalisation Dynamics of Policy Learning in High
  Dimensions
The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions
Nishil Patel
Sebastian Lee
Stefano Sarao Mannelli
Sebastian Goldt
Adrew Saxe
OffRL
30
3
0
17 Jun 2023
Machine learning with tree tensor networks, CP rank constraints, and tensor dropout
Machine learning with tree tensor networks, CP rank constraints, and tensor dropout
Hao Chen
T. Barthel
48
7
0
30 May 2023
Provable and Practical: Efficient Exploration in Reinforcement Learning
  via Langevin Monte Carlo
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo
Haque Ishfaq
Qingfeng Lan
Pan Xu
A. R. Mahmood
Doina Precup
Anima Anandkumar
Kamyar Azizzadenesheli
BDL
OffRL
28
20
0
29 May 2023
Local SGD Accelerates Convergence by Exploiting Second Order Information
  of the Loss Function
Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function
Linxuan Pan
Shenghui Song
FedML
25
2
0
24 May 2023
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
ASDL: A Unified Interface for Gradient Preconditioning in PyTorch
Kazuki Osawa
Satoki Ishikawa
Rio Yokota
Shigang Li
Torsten Hoefler
ODL
40
14
0
08 May 2023
The R-mAtrIx Net
The R-mAtrIx Net
Shailesh Lal
Suvajit Majumder
E. Sobko
24
5
0
14 Apr 2023
Type-II Saddles and Probabilistic Stability of Stochastic Gradient
  Descent
Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent
Liu Ziyin
Botao Li
Tomer Galanti
Masakuni Ueda
37
7
0
23 Mar 2023
Complex Clipping for Improved Generalization in Machine Learning
Complex Clipping for Improved Generalization in Machine Learning
L. Atlas
Nicholas Rasmussen
Felix Schwock
Mert Pilanci
20
0
0
27 Feb 2023
On a continuous time model of gradient descent dynamics and instability
  in deep learning
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
20
7
0
03 Feb 2023
Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning
Rewarded meta-pruning: Meta Learning with Rewards for Channel Pruning
Athul Shibu
Abhishek Kumar
Heechul Jung
Dong-Gyu Lee
17
1
0
26 Jan 2023
Exploring Complex Dynamical Systems via Nonconvex Optimization
Exploring Complex Dynamical Systems via Nonconvex Optimization
Hunter L. Elliott
18
0
0
03 Jan 2023
Escaping Saddle Points for Effective Generalization on Class-Imbalanced
  Data
Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
R. Venkatesh Babu
31
29
0
28 Dec 2022
Langevin algorithms for very deep Neural Networks with application to
  image classification
Langevin algorithms for very deep Neural Networks with application to image classification
Pierre Bras
20
6
0
27 Dec 2022
Langevin algorithms for Markovian Neural Networks and Deep Stochastic
  control
Langevin algorithms for Markovian Neural Networks and Deep Stochastic control
Pierre Bras
Gilles Pagès
22
3
0
22 Dec 2022
Scalable Bayesian Uncertainty Quantification for Neural Network
  Potentials: Promise and Pitfalls
Scalable Bayesian Uncertainty Quantification for Neural Network Potentials: Promise and Pitfalls
Stephan Thaler
Gregor Doehner
Julija Zavadlav
35
21
0
15 Dec 2022
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast
  Evasion of Non-Degenerate Saddle Points
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast Evasion of Non-Degenerate Saddle Points
Mayank Baranwal
Param Budhraja
V. Raj
A. Hota
33
2
0
07 Dec 2022
On the Overlooked Structure of Stochastic Gradients
On the Overlooked Structure of Stochastic Gradients
Zeke Xie
Qian-Yuan Tang
Mingming Sun
P. Li
31
6
0
05 Dec 2022
A survey of deep learning optimizers -- first and second order methods
A survey of deep learning optimizers -- first and second order methods
Rohan Kashyap
ODL
37
6
0
28 Nov 2022
PAC-Bayes Compression Bounds So Tight That They Can Explain
  Generalization
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDL
MLT
AI4CE
29
51
0
24 Nov 2022
Escaping From Saddle Points Using Asynchronous Coordinate Gradient
  Descent
Escaping From Saddle Points Using Asynchronous Coordinate Gradient Descent
Marco Bornstein
Jin-Peng Liu
Jingling Li
Furong Huang
21
0
0
17 Nov 2022
Gradient flow dynamics of shallow ReLU networks for square loss and
  orthogonal inputs
Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs
Etienne Boursier
Loucas Pillaud-Vivien
Nicolas Flammarion
ODL
27
58
0
02 Jun 2022
Decoupling multivariate functions using a nonparametric filtered tensor
  decomposition
Decoupling multivariate functions using a nonparametric filtered tensor decomposition
J. Decuyper
K. Tiels
S. Weiland
M. Runacres
J. Schoukens
24
3
0
23 May 2022
Training neural networks using Metropolis Monte Carlo and an adaptive
  variant
Training neural networks using Metropolis Monte Carlo and an adaptive variant
S. Whitelam
V. Selin
Ian Benlolo
Corneel Casert
Isaac Tamblyn
BDL
13
7
0
16 May 2022
Gradient Descent, Stochastic Optimization, and Other Tales
Gradient Descent, Stochastic Optimization, and Other Tales
Jun Lu
16
8
0
02 May 2022
FuNNscope: Visual microscope for interactively exploring the loss
  landscape of fully connected neural networks
FuNNscope: Visual microscope for interactively exploring the loss landscape of fully connected neural networks
Aleksandar Doknic
Torsten Moller
36
2
0
09 Apr 2022
Random matrix analysis of deep neural network weight matrices
Random matrix analysis of deep neural network weight matrices
M. Thamm
Max Staats
B. Rosenow
35
12
0
28 Mar 2022
Myriad: a real-world testbed to bridge trajectory optimization and deep
  learning
Myriad: a real-world testbed to bridge trajectory optimization and deep learning
Nikolaus H. R. Howe
Simon Dufort-Labbé
Nitarshan Rajkumar
Pierre-Luc Bacon
32
5
0
22 Feb 2022
How Do Vision Transformers Work?
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
47
466
0
14 Feb 2022
When Do Flat Minima Optimizers Work?
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
24
58
0
01 Feb 2022
On the Power-Law Hessian Spectrums in Deep Learning
On the Power-Law Hessian Spectrums in Deep Learning
Zeke Xie
Qian-Yuan Tang
Yunfeng Cai
Mingming Sun
P. Li
ODL
42
9
0
31 Jan 2022
Gradient Descent on Neurons and its Link to Approximate Second-Order
  Optimization
Gradient Descent on Neurons and its Link to Approximate Second-Order Optimization
Frederik Benzing
ODL
43
23
0
28 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning
  Optimization Landscape
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
25
34
0
20 Jan 2022
Forecasting Brain Activity Based on Models of Spatio-Temporal Brain
  Dynamics: A Comparison of Graph Neural Network Architectures
Forecasting Brain Activity Based on Models of Spatio-Temporal Brain Dynamics: A Comparison of Graph Neural Network Architectures
S. Wein
Alina Schüller
A. Tomé
W. Malloni
M. Greenlee
E. Lang
AI4CE
38
14
0
08 Dec 2021
Boosting Unsupervised Domain Adaptation with Soft Pseudo-label and
  Curriculum Learning
Boosting Unsupervised Domain Adaptation with Soft Pseudo-label and Curriculum Learning
Shengjia Zhang
Tiancheng Lin
Yi Tian Xu
30
5
0
03 Dec 2021
Escape saddle points by a simple gradient-descent based algorithm
Escape saddle points by a simple gradient-descent based algorithm
Chenyi Zhang
Tongyang Li
ODL
31
15
0
28 Nov 2021
NCVX: A User-Friendly and Scalable Package for Nonconvex Optimization in
  Machine Learning
NCVX: A User-Friendly and Scalable Package for Nonconvex Optimization in Machine Learning
Buyun Liang
Tim Mitchell
Ju Sun
17
3
0
27 Nov 2021
12345
Next