v1v2v3 (latest)

Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes

24 April 2022

Chao Ma

Papers citing "Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes"

23 / 23 papers shown

Title
Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes Ruiqi Zhang Jingfeng Wu Licong Lin Peter L. Bartlett 73 2 0 05 Apr 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 121 7 0 17 Feb 2025
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training Zhanpeng Zhou Mingze Wang Yuchen Mao Bingrui Li Junchi Yan AAML 95 1 0 14 Oct 2024
Understanding the unstable convergence of gradient descent Kwangjun Ahn J.N. Zhang S. Sra 85 63 0 03 Apr 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework Zhiyuan Li Tianhao Wang Sanjeev Arora MLT 115 105 0 13 Oct 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect Yuqing Wang Minshuo Chen T. Zhao Molei Tao AI4CE 104 42 0 07 Oct 2021
The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion D. Kunin Javier Sagastuy-Breña Lauren Gillespie Eshed Margalit Hidenori Tanaka Surya Ganguli Daniel L. K. Yamins 70 20 0 19 Jul 2021
Label Noise SGD Provably Prefers Flat Global Minimizers Alexandru Damian Tengyu Ma Jason D. Lee NoLa 120 120 0 11 Jun 2021
On Linear Stability of SGD and Input-Smoothness of Neural Networks Chao Ma Lexing Ying MLT 54 44 0 27 May 2021
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability Jeremy M. Cohen Simran Kaur Yuanzhi Li J. Zico Kolter Ameet Talwalkar ODL 104 279 0 26 Feb 2021
Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function Lingkai Kong Molei Tao 35 23 0 14 Feb 2020
Loss Landscape Sightseeing with Multi-Point Optimization Ivan Skorokhodov Andrey Kravchenko 3DPC 44 18 0 09 Oct 2019
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks? Niv Giladi Mor Shpigel Nacson Elad Hoffer Daniel Soudry 80 22 0 26 Sep 2019
Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process Guy Blanc Neha Gupta Gregory Valiant Paul Valiant 147 147 0 19 Apr 2019
A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics E. Weinan Chao Ma Lei Wu MLT 64 123 0 08 Apr 2019
On Lazy Training in Differentiable Programming Lénaïc Chizat Edouard Oyallon Francis R. Bach 111 840 0 19 Dec 2018
Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent Xiaowu Dai Yuhua Zhu 53 11 0 03 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks Difan Zou Yuan Cao Dongruo Zhou Quanquan Gu ODL 198 448 0 21 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization Zeyuan Allen-Zhu Yuanzhi Li Zhao Song AI4CE ODL 266 1,469 0 09 Nov 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks S. Du Xiyu Zhai Barnabás Póczós Aarti Singh MLT ODL 233 1,276 0 04 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot Franck Gabriel Clément Hongler 273 3,223 0 20 Jun 2018
An Alternative View: When Does SGD Escape Local Minima? Robert D. Kleinberg Yuanzhi Li Yang Yuan MLT 80 317 0 17 Feb 2018
Sharp Minima Can Generalize For Deep Nets Laurent Dinh Razvan Pascanu Samy Bengio Yoshua Bengio ODL 138 774 0 15 Mar 2017