Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

21 November 2018

Quanquan Gu

Papers citing "Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks"

50 / 111 papers shown

Title
Statistically guided deep learning Michael Kohler A. Krzyżak ODL BDL 79 0 0 11 Apr 2025
Extended convexity and smoothness and their applications in deep learning Binchuan Qi Wei Gong Li Li 61 0 0 08 Oct 2024
Loss Gradient Gaussian Width based Generalization and Optimization Guarantees A. Banerjee Qiaobo Li Yingxue Zhou 49 0 0 11 Jun 2024
NTK-Guided Few-Shot Class Incremental Learning Jingren Liu Zhong Ji Yanwei Pang Yunlong Yu CLL 39 3 0 19 Mar 2024
$\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike \emph{sign} perceptrons neural networks$ \emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike \emph{sign} perceptrons neural networks M. Stojnic 24 1 0 13 Dec 2023
Capacity of the treelike sign perceptrons neural networks with one hidden layer -- RDT based upper bounds M. Stojnic 18 4 0 13 Dec 2023
Differentially Private Non-convex Learning for Multi-layer Neural Networks Hanpu Shen Cheng-Long Wang Zihang Xiang Yiming Ying Di Wang 46 7 0 12 Oct 2023
Fundamental Limits of Deep Learning-Based Binary Classifiers Trained with Hinge Loss T. Getu Georges Kaddoum M. Bennis 40 1 0 13 Sep 2023
Learning Prescriptive ReLU Networks Wei-Ju Sun Asterios Tsiourvas 21 2 0 01 Jun 2023
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks Ye Li Songcan Chen Shengyi Huang PINN 20 1 0 03 Mar 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models Yufeng Zhang Boyi Liu Qi Cai Lingxiao Wang Zhaoran Wang 53 11 0 30 Dec 2022
Characterizing the Spectrum of the NTK via a Power Series Expansion Michael Murray Hui Jin Benjamin Bowman Guido Montúfar 38 11 0 15 Nov 2022
Multilayer Perceptron Network Discriminates Larval Zebrafish Genotype using Behaviour Christopher Fusco Angel G Allen 21 0 0 06 Nov 2022
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks Zhengdao Chen Eric Vanden-Eijnden Joan Bruna MLT 25 5 0 28 Oct 2022
When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work Jiawei Zhang Yushun Zhang Mingyi Hong Ruoyu Sun Zhi-Quan Luo 26 10 0 21 Oct 2022
Global Convergence of SGD On Two Layer Neural Nets Pulkit Gopalani Anirbit Mukherjee 26 5 0 20 Oct 2022
On skip connections and normalisation layers in deep optimisation L. MacDonald Jack Valmadre Hemanth Saratchandran Simon Lucey ODL 19 1 0 10 Oct 2022
Why neural networks find simple solutions: the many regularizers of geometric complexity Benoit Dherin Michael Munn M. Rosca David Barrett 55 30 0 27 Sep 2022
Neural Networks can Learn Representations with Gradient Descent Alexandru Damian Jason D. Lee Mahdi Soltanolkotabi SSL MLT 19 114 0 30 Jun 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 40 69 0 14 Jun 2022
On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Lam M. Nguyen Trang H. Tran 32 2 0 13 Jun 2022
Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials Eshaan Nichani Yunzhi Bai Jason D. Lee 27 10 0 08 Jun 2022
Non-convex online learning via algorithmic equivalence Udaya Ghai Zhou Lu Elad Hazan 14 8 0 30 May 2022
Global Convergence of Over-parameterized Deep Equilibrium Models Zenan Ling Xingyu Xie Qiuhao Wang Zongpeng Zhang Zhouchen Lin 32 12 0 27 May 2022
Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width Hanxu Zhou Qixuan Zhou Zhenyuan Jin Tao Luo Yaoyu Zhang Zhi-Qin John Xu 25 20 0 24 May 2022
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes Chao Ma D. Kunin Lei Wu Lexing Ying 25 27 0 24 Apr 2022
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks Benjamin Bowman Guido Montúfar 23 11 0 12 Jan 2022
On the Convergence and Robustness of Adversarial Training Yisen Wang Xingjun Ma James Bailey Jinfeng Yi Bowen Zhou Quanquan Gu AAML 194 345 0 15 Dec 2021
Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions Martin Hutzenthaler Arnulf Jentzen Katharina Pohl Adrian Riekert Luca Scarpa MLT 34 6 0 13 Dec 2021
On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons Fangshuo Liao Anastasios Kyrillidis 41 16 0 05 Dec 2021
Embedding Principle: a hierarchical structure of loss landscape of deep neural networks Yaoyu Zhang Yuqing Li Zhongwang Zhang Tao Luo Z. Xu 29 21 0 30 Nov 2021
Learning with convolution and pooling operations in kernel methods Theodor Misiakiewicz Song Mei MLT 15 29 0 16 Nov 2021
Subquadratic Overparameterization for Shallow Neural Networks Chaehwan Song Ali Ramezani-Kebrya Thomas Pethick Armin Eftekhari V. Cevher 27 31 0 02 Nov 2021
Provable Regret Bounds for Deep Online Learning and Control Xinyi Chen Edgar Minasyan Jason D. Lee Elad Hazan 36 6 0 15 Oct 2021
A global convergence theory for deep ReLU implicit networks via over-parameterization Tianxiang Gao Hailiang Liu Jia Liu Hridesh Rajan Hongyang Gao MLT 28 16 0 11 Oct 2021
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent Spencer Frei Quanquan Gu 26 25 0 25 Jun 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization Mufan Bill Li Mihai Nica Daniel M. Roy 30 33 0 07 Jun 2021
Global Convergence of Three-layer Neural Networks in the Mean Field Regime H. Pham Phan-Minh Nguyen MLT AI4CE 41 19 0 11 May 2021
Generalization Guarantees for Neural Architecture Search with Train-Validation Split Samet Oymak Mingchen Li Mahdi Soltanolkotabi AI4CE OOD 36 13 0 29 Apr 2021
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions Arnulf Jentzen Adrian Riekert MLT 32 13 0 01 Apr 2021
Experiments with Rich Regime Training for Deep Learning Xinyan Li A. Banerjee 32 2 0 26 Feb 2021
Learning with invariances in random features and kernel models Song Mei Theodor Misiakiewicz Andrea Montanari OOD 46 89 0 25 Feb 2021
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases Arnulf Jentzen T. Kröger ODL 28 7 0 23 Feb 2021
A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions Patrick Cheridito Arnulf Jentzen Adrian Riekert Florian Rossmannek 28 24 0 19 Feb 2021
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training Cong Fang Hangfeng He Qi Long Weijie J. Su FAtt 130 165 0 29 Jan 2021
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks Asaf Noy Yi Tian Xu Y. Aflalo Lihi Zelnik-Manor R. L. Jin 33 3 0 12 Jan 2021
Advances in Electron Microscopy with Deep Learning Jeffrey M. Ede 32 2 0 04 Jan 2021
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning Zeyuan Allen-Zhu Yuanzhi Li FedML 58 355 0 17 Dec 2020
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces Zhuoran Yang Chi Jin Zhaoran Wang Mengdi Wang Michael I. Jordan 37 18 0 09 Nov 2020
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime Andrea Agazzi Jianfeng Lu 13 15 0 22 Oct 2020