Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks

21 November 2018

Quanquan Gu

Papers citing "Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks"

50 / 116 papers shown

Title
Prediction intervals for Deep Neural Networks Tullio Mancini Hector F. Calvo-Pardo Jose Olmo UQCV OOD 23 4 0 08 Oct 2020
On the linearity of large non-linear models: when and why the tangent kernel is constant Chaoyue Liu Libin Zhu M. Belkin 21 140 0 02 Oct 2020
Deep Equals Shallow for ReLU Networks in Kernel Regimes A. Bietti Francis R. Bach 28 86 0 30 Sep 2020
Tensor Programs III: Neural Matrix Laws Greg Yang 14 43 0 22 Sep 2020
Review: Deep Learning in Electron Microscopy Jeffrey M. Ede 34 79 0 17 Sep 2020
GraphNorm: A Principled Approach to Accelerating Graph Neural Network Training Tianle Cai Shengjie Luo Keyulu Xu Di He Tie-Yan Liu Liwei Wang GNN 32 158 0 07 Sep 2020
Predicting Training Time Without Training L. Zancato Alessandro Achille Avinash Ravichandran Rahul Bhotika Stefano Soatto 26 24 0 28 Aug 2020
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy Zuyue Fu Zhuoran Yang Zhaoran Wang 15 42 0 02 Aug 2020
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy E. Moroshko Suriya Gunasekar Blake E. Woodworth J. Lee Nathan Srebro Daniel Soudry 35 85 0 13 Jul 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture Greg Yang 58 134 0 25 Jun 2020
Directional convergence and alignment in deep learning Ziwei Ji Matus Telgarsky 17 163 0 11 Jun 2020
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory Yufeng Zhang Qi Cai Zhuoran Yang Yongxin Chen Zhaoran Wang OOD MLT 105 11 0 08 Jun 2020
Is deeper better? It depends on locality of relevant features Takashi Mori Masahito Ueda OOD 25 4 0 26 May 2020
Feature Purification: How Adversarial Training Performs Robust Deep Learning Zeyuan Allen-Zhu Yuanzhi Li MLT AAML 35 147 0 20 May 2020
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth Yiping Lu Chao Ma Yulong Lu Jianfeng Lu Lexing Ying MLT 39 78 0 11 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 159 234 0 04 Mar 2020
Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation Arnulf Jentzen Timo Welti 17 15 0 03 Mar 2020
Warwick Electron Microscopy Datasets Jeffrey M. Ede 19 14 0 02 Mar 2020
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks Chaoyue Liu Libin Zhu M. Belkin ODL 9 247 0 29 Feb 2020
Convergence of End-to-End Training in Deep Unsupervised Contrastive Learning Zixin Wen SSL 21 2 0 17 Feb 2020
Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality Yi Zhang Orestis Plevrakis S. Du Xingguo Li Zhao Song Sanjeev Arora 21 51 0 16 Feb 2020
LaProp: Separating Momentum and Adaptivity in Adam Liu Ziyin Zhikang T.Wang Masahito Ueda ODL 8 18 0 12 Feb 2020
Memory capacity of neural networks with threshold and ReLU activations Roman Vershynin 31 21 0 20 Jan 2020
Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity Shiyu Liang Ruoyu Sun R. Srikant 35 19 0 31 Dec 2019
Optimization for deep learning: theory and algorithms Ruoyu Sun ODL 19 168 0 19 Dec 2019
Towards Understanding the Spectral Bias of Deep Learning Yuan Cao Zhiying Fang Yue Wu Ding-Xuan Zhou Quanquan Gu 32 214 0 03 Dec 2019
Neural Contextual Bandits with UCB-based Exploration Dongruo Zhou Lihong Li Quanquan Gu 36 15 0 11 Nov 2019
Enhanced Convolutional Neural Tangent Kernels Zhiyuan Li Ruosong Wang Dingli Yu S. Du Wei Hu Ruslan Salakhutdinov Sanjeev Arora 16 131 0 03 Nov 2019
Global Convergence of Gradient Descent for Deep Linear Residual Networks Lei Wu Qingcan Wang Chao Ma ODL AI4CE 28 22 0 02 Nov 2019
Growing axons: greedy learning of neural networks with application to function approximation Daria Fokina Ivan Oseledets 21 18 0 28 Oct 2019
The Local Elasticity of Neural Networks Hangfeng He Weijie J. Su 40 44 0 15 Oct 2019
Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks Spencer Frei Yuan Cao Quanquan Gu ODL 9 31 0 07 Oct 2019
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks Sanjeev Arora S. Du Zhiyuan Li Ruslan Salakhutdinov Ruosong Wang Dingli Yu AAML 16 161 0 03 Oct 2019
Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks Yu Bai J. Lee 24 116 0 03 Oct 2019
Sample Efficient Policy Gradient Methods with Recursive Variance Reduction Pan Xu F. Gao Quanquan Gu 28 83 0 18 Sep 2019
Stochastic AUC Maximization with Deep Neural Networks Mingrui Liu Zhuoning Yuan Yiming Ying Tianbao Yang 9 103 0 28 Aug 2019
Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization T. Poggio Andrzej Banburski Q. Liao ODL 31 161 0 25 Aug 2019
The generalization error of random features regression: Precise asymptotics and double descent curve Song Mei Andrea Montanari 57 626 0 14 Aug 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization Xinyan Li Qilong Gu Yingxue Zhou Tiancong Chen A. Banerjee ODL 42 51 0 24 Jul 2019
Benign Overfitting in Linear Regression Peter L. Bartlett Philip M. Long Gábor Lugosi Alexander Tsigler MLT 8 762 0 26 Jun 2019
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy Boyi Liu Qi Cai Zhuoran Yang Zhaoran Wang 24 108 0 25 Jun 2019
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks Kaifeng Lyu Jian Li 52 322 0 13 Jun 2019
Kernel and Rich Regimes in Overparametrized Models Blake E. Woodworth Suriya Gunasekar Pedro H. P. Savarese E. Moroshko Itay Golan J. Lee Daniel Soudry Nathan Srebro 24 352 0 13 Jun 2019
Generalization bounds for deep convolutional neural networks Philip M. Long Hanie Sedghi MLT 42 89 0 29 May 2019
Norm-based generalisation bounds for multi-class convolutional neural networks Antoine Ledent Waleed Mustafa Yunwen Lei Marius Kloft 18 5 0 29 May 2019
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems Tianle Cai Ruiqi Gao Jikai Hou Siyu Chen Dong Wang Di He Zhihua Zhang Liwei Wang ODL 21 57 0 28 May 2019
What Can ResNet Learn Efficiently, Going Beyond Kernels? Zeyuan Allen-Zhu Yuanzhi Li 24 183 0 24 May 2019
Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems Atsushi Nitanda Geoffrey Chinot Taiji Suzuki MLT 16 33 0 23 May 2019
A type of generalization error induced by initialization in deep neural networks Yaoyu Zhang Zhi-Qin John Xu Tao Luo Zheng Ma 9 49 0 19 May 2019
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation Colin Wei Tengyu Ma 20 109 0 09 May 2019