Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks

26 September 2019

Papers citing "Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks"

39 / 39 papers shown

Title
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods Hossein Taheri Christos Thrampoulidis Arya Mazumdar MLT 31 0 0 13 Oct 2024
Performance of NPG in Countable State-Space Average-Cost RL Yashaswini Murthy Isaac Grosof S. T. Maguluri R. Srikant OffRL 29 1 0 30 May 2024
$\emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike \emph{sign} perceptrons neural networks$ \emph{Lifted} RDT based capacity analysis of the 1-hidden layer treelike \emph{sign} perceptrons neural networks M. Stojnic 22 1 0 13 Dec 2023
Capacity of the treelike sign perceptrons neural networks with one hidden layer -- RDT based upper bounds M. Stojnic 16 4 0 13 Dec 2023
How to Protect Copyright Data in Optimization of Large Language Models? T. Chu Zhao-quan Song Chiwun Yang 32 29 0 23 Aug 2023
Fine-grained analysis of non-parametric estimation for pairwise learning Junyu Zhou Shuo Huang Han Feng Puyu Wang Ding-Xuan Zhou 37 1 0 31 May 2023
Convergence beyond the over-parameterized regime using Rayleigh quotients David A. R. Robin Kevin Scaman Marc Lelarge 17 3 0 19 Jan 2023
Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks Ilja Kuzborskij Csaba Szepesvári 21 4 0 28 Dec 2022
Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing Josh Alman Jiehao Liang Zhao-quan Song Ruizhe Zhang Danyang Zhuo 71 31 0 25 Nov 2022
When Expressivity Meets Trainability: Fewer than $n$ Neurons Can Work Jiawei Zhang Yushun Zhang Mingyi Hong Ruoyu Sun Z. Luo 21 10 0 21 Oct 2022
Global Convergence of SGD On Two Layer Neural Nets Pulkit Gopalani Anirbit Mukherjee 18 5 0 20 Oct 2022
Approximation results for Gradient Descent trained Shallow Neural Networks in $1d$ R. Gentile G. Welper ODL 44 6 0 17 Sep 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability Z. Li Zixuan Wang Jian Li 19 42 0 26 Jul 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit Boaz Barak Benjamin L. Edelman Surbhi Goel Sham Kakade Eran Malach Cyril Zhang 25 123 0 18 Jul 2022
Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity Jianyi Yang Shaolei Ren 24 3 0 02 Jul 2022
Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis Alexander Munteanu Simon Omlor Zhao-quan Song David P. Woodruff 22 15 0 26 Jun 2022
Randomly Initialized One-Layer Neural Networks Make Data Linearly Separable Promit Ghosal Srinath Mahankali Yihang Sun MLT 17 4 0 24 May 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation Jimmy Ba Murat A. Erdogdu Taiji Suzuki Zhichao Wang Denny Wu Greg Yang MLT 29 121 0 03 May 2022
Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks Bartlomiej Polaczyk J. Cyranka ODL 28 3 0 28 Jan 2022
AutoBalance: Optimized Loss Functions for Imbalanced Data Mingchen Li Xuechen Zhang Christos Thrampoulidis Jiasi Chen Samet Oymak 14 67 0 04 Jan 2022
On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons Fangshuo Liao Anastasios Kyrillidis 36 16 0 05 Dec 2021
Subquadratic Overparameterization for Shallow Neural Networks Chaehwan Song Ali Ramezani-Kebrya Thomas Pethick Armin Eftekhari V. Cevher 22 32 0 02 Nov 2021
Provable Regret Bounds for Deep Online Learning and Control Xinyi Chen Edgar Minasyan Jason D. Lee Elad Hazan 21 6 0 15 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations Jiayao Zhang Hua Wang Weijie J. Su 27 7 0 11 Oct 2021
Does Preprocessing Help Training Over-parameterized Neural Networks? Zhao-quan Song Shuo Yang Ruizhe Zhang 27 49 0 09 Oct 2021
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization Difan Zou Yuan Cao Yuanzhi Li Quanquan Gu MLT AI4CE 41 37 0 25 Aug 2021
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent Spencer Frei Quanquan Gu 15 25 0 25 Jun 2021
Experiments with Rich Regime Training for Deep Learning Xinyan Li A. Banerjee 21 2 0 26 Feb 2021
On the linearity of large non-linear models: when and why the tangent kernel is constant Chaoyue Liu Libin Zhu M. Belkin 14 138 0 02 Oct 2020
Deep Networks and the Multiple Manifold Problem Sam Buchanan D. Gilboa John N. Wright 166 39 0 25 Aug 2020
The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training Andrea Montanari Yiqiao Zhong 36 95 0 25 Jul 2020
Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory Yufeng Zhang Qi Cai Zhuoran Yang Yongxin Chen Zhaoran Wang OOD MLT 58 11 0 08 Jun 2020
Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond Fanghui Liu Xiaolin Huang Yudong Chen Johan A. K. Suykens BDL 30 172 0 23 Apr 2020
Learning Parities with Neural Networks Amit Daniely Eran Malach 13 76 0 18 Feb 2020
Convergence of End-to-End Training in Deep Unsupervised Contrastive Learning Zixin Wen SSL 16 2 0 17 Feb 2020
Memory capacity of neural networks with threshold and ReLU activations Roman Vershynin 21 21 0 20 Jan 2020
Deep Network Approximation for Smooth Functions Jianfeng Lu Zuowei Shen Haizhao Yang Shijun Zhang 33 247 0 09 Jan 2020
Towards Understanding the Spectral Bias of Deep Learning Yuan Cao Zhiying Fang Yue Wu Ding-Xuan Zhou Quanquan Gu 18 214 0 03 Dec 2019
Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems Atsushi Nitanda Geoffrey Chinot Taiji Suzuki MLT 8 33 0 23 May 2019