v1v2 (latest)

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

4 October 2018

Aarti Singh

Papers citing "Gradient Descent Provably Optimizes Over-parameterized Neural Networks"

50 / 882 papers shown

Title
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher Mehdi Rezagholizadeh A. Jafari Puneeth Salad Pranav Sharma Ali Saheb Pasand A. Ghodsi 143 18 0 16 Oct 2021
Provable Regret Bounds for Deep Online Learning and Control Xinyi Chen Edgar Minasyan Jason D. Lee Elad Hazan 115 6 0 15 Oct 2021
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework Zhiyuan Li Tianhao Wang Sanjeev Arora MLT 121 105 0 13 Oct 2021
AIR-Net: Adaptive and Implicit Regularization Neural Network for Matrix Completion Zhemin Li Tao Sun Hongxia Wang Bao Wang 88 6 0 12 Oct 2021
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks Shuai Zhang Meng Wang Sijia Liu Pin-Yu Chen Jinjun Xiong UQCV MLT 85 13 0 12 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations Jiayao Zhang Hua Wang Weijie J. Su 96 8 0 11 Oct 2021
Deep Bayesian inference for seismic imaging with tasks Ali Siahkoohi G. Rizzuti Felix J. Herrmann BDL UQCV 97 21 0 10 Oct 2021
Does Preprocessing Help Training Over-parameterized Neural Networks? Zhao Song Shuo Yang Ruizhe Zhang 98 50 0 09 Oct 2021
Distinguishing rule- and exemplar-based generalization in learning systems Ishita Dasgupta Erin Grant Thomas Griffiths 88 16 0 08 Oct 2021
New Insights into Graph Convolutional Networks using Neural Tangent Kernels Mahalakshmi Sabanayagam Pascal Esser Debarghya Ghoshdastidar 64 6 0 08 Oct 2021
Neural Tangent Kernel Empowered Federated Learning Kai Yue Richeng Jin Ryan Pilgrim Chau-Wai Wong D. Baron H. Dai FedML 73 17 0 07 Oct 2021
On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime Zhiyan Ding Shi Chen Qin Li S. Wright MLT AI4CE 95 11 0 06 Oct 2021
Efficient and Private Federated Learning with Partially Trainable Networks Hakim Sidahmed Zheng Xu Ankush Garg Yuan Cao Mingqing Chen FedML 124 13 0 06 Oct 2021
Scale-invariant Learning by Physics Inversion Philipp Holl V. Koltun Nils Thuerey PINN AI4CE 76 9 0 30 Sep 2021
On the Provable Generalization of Recurrent Neural Networks Lifu Wang Bo Shen Bo Hu Xing Cao 144 8 0 29 Sep 2021
The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation Anna Winnicki Joseph Lubars Michael Livesay R. Srikant 74 3 0 28 Sep 2021
Theory of overparametrization in quantum neural networks Martín Larocca Nathan Ju Diego García-Martín Patrick J. Coles M. Cerezo 103 192 0 23 Sep 2021
Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks Zhichao Wang Yizhe Zhu 107 20 0 20 Sep 2021
AdaLoss: A computationally-efficient and provably convergent adaptive gradient method Xiaoxia Wu Yuege Xie S. Du Rachel A. Ward ODL 49 7 0 17 Sep 2021
Stationary Density Estimation of Itô Diffusions Using Deep Learning Yiqi Gu J. Harlim Senwei Liang Haizhao Yang 86 12 0 09 Sep 2021
NASI: Label- and Data-agnostic Neural Architecture Search at Initialization Yao Shu Shaofeng Cai Zhongxiang Dai Beng Chin Ooi K. H. Low 98 44 0 02 Sep 2021
When and how epochwise double descent happens Cory Stephenson Tyler Lee 82 15 0 26 Aug 2021
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization Difan Zou Yuan Cao Yuanzhi Li Quanquan Gu MLT AI4CE 113 44 0 25 Aug 2021
Fast Sketching of Polynomial Kernels of Polynomial Degree Zhao Song David P. Woodruff Zheng Yu Lichen Zhang 82 41 0 21 Aug 2021
Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation Simon Eberle Arnulf Jentzen Adrian Riekert G. Weiss 76 12 0 18 Aug 2021
Towards Understanding Theoretical Advantages of Complex-Reaction Networks Shao-Qun Zhang Gaoxin Wei Zhi Zhou 54 17 0 15 Aug 2021
A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions Arnulf Jentzen Adrian Riekert 82 13 0 10 Aug 2021
Convergence of gradient descent for learning linear neural networks Gabin Maxime Nguegnang Holger Rauhut Ulrich Terstiege MLT 67 18 0 04 Aug 2021
Geometry of Linear Convolutional Networks Kathlén Kohn Thomas Merkh Guido Montúfar Matthew Trager 117 20 0 03 Aug 2021
Towards General Function Approximation in Zero-Sum Markov Games Baihe Huang Jason D. Lee Zhaoran Wang Zhuoran Yang 88 47 0 30 Jul 2021
Deep Networks Provably Classify Data on Curves Tingran Wang Sam Buchanan D. Gilboa John N. Wright 83 9 0 29 Jul 2021
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers Colin Wei Yining Chen Tengyu Ma 79 92 0 28 Jul 2021
Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel Dominic Richards Ilja Kuzborskij 82 29 0 27 Jul 2021
SGD with a Constant Large Learning Rate Can Converge to Local Maxima Liu Ziyin Botao Li James B. Simon Masakuni Ueda 104 9 0 25 Jul 2021
Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time Yuyang Deng Mohammad Mahdi Kamani M. Mahdavi FedML 68 14 0 22 Jul 2021
Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations Pranjal Awasthi Alex K. Tang Aravindan Vijayaraghavan MLT 59 21 0 21 Jul 2021
Distribution of Classification Margins: Are All Data Equal? Andrzej Banburski Fernanda De La Torre Nishka Pant Ishana Shastri T. Poggio 72 4 0 21 Jul 2021
Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping Ilja Kuzborskij Csaba Szepesvári 105 7 0 12 Jul 2021
Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation Arnulf Jentzen Adrian Riekert 55 23 0 09 Jul 2021
Rethinking Positional Encoding Jianqiao Zheng Sameera Ramasinghe Simon Lucey 85 52 0 06 Jul 2021
Partition and Code: learning how to compress graphs Giorgos Bouritsas Andreas Loukas Nikolaos Karalias M. Bronstein 81 13 0 05 Jul 2021
Provable Convergence of Nesterov's Accelerated Gradient Method for Over-Parameterized Neural Networks Xin Liu Zhisong Pan Wei Tao 155 9 0 05 Jul 2021
A Theoretical Analysis of Fine-tuning with Linear Teachers Gal Shachaf Alon Brutzkus Amir Globerson 91 17 0 04 Jul 2021
Random Neural Networks in the Infinite Width Limit as Gaussian Processes Boris Hanin BDL 100 48 0 04 Jul 2021
A Generalized Lottery Ticket Hypothesis Ibrahim Alabdulmohsin L. Markeeva Daniel Keysers Ilya O. Tolstikhin 69 6 0 03 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition Minghao Chen Houwen Peng Jianlong Fu Haibin Ling ViT 104 268 0 01 Jul 2021
Fast Margin Maximization via Dual Acceleration Ziwei Ji Nathan Srebro Matus Telgarsky 67 39 0 01 Jul 2021
Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity Arthur Jacot François Ged Berfin cSimcsek Clément Hongler Franck Gabriel 86 55 0 30 Jun 2021
A Non-parametric View of FedAvg and FedProx: Beyond Stationary Points Lili Su Jiaming Xu Pengkun Yang FedML 85 13 0 29 Jun 2021
Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit Yichi Zhou Shihong Song Huishuai Zhang Jun Zhu Wei Chen Tie-Yan Liu 32 0 0 29 Jun 2021