Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.02054
Cited By
v1
v2 (latest)
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
4 October 2018
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Gradient Descent Provably Optimizes Over-parameterized Neural Networks"
50 / 882 papers shown
Title
Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training
Luís Carvalho
Joao L. Costa
José Mourao
Gonccalo Oliveira
AI4CE
62
2
0
06 Apr 2023
Learning with augmented target information: An alternative theory of Feedback Alignment
Huzi Cheng
Joshua W. Brown
CVBM
65
0
0
03 Apr 2023
Depth Separation with Multilayer Mean-Field Networks
Y. Ren
Mo Zhou
Rong Ge
OOD
85
3
0
03 Apr 2023
Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Łojasiewicz Condition
Chen Fan
Christos Thrampoulidis
Mark Schmidt
58
2
0
02 Apr 2023
An Over-parameterized Exponential Regression
Yeqi Gao
Sridhar Mahadevan
Zhao Song
81
39
0
29 Mar 2023
Kernel interpolation generalizes poorly
Yicheng Li
Haobo Zhang
Qian Lin
78
11
0
28 Mar 2023
Convergence Guarantees of Overparametrized Wide Deep Inverse Prior
Nathan Buskulic
Yvain Quéau
M. Fadili
BDL
71
2
0
20 Mar 2023
Learning Fractals by Gradient Descent
Cheng-Hao Tu
Hong-You Chen
David Carlyn
Wei-Lun Chao
57
3
0
14 Mar 2023
Phase Diagram of Initial Condensation for Two-layer Neural Networks
Zheng Chen
Yuqing Li
Yaoyu Zhang
Zhaoguang Zhou
Z. Xu
MLT
AI4CE
104
11
0
12 Mar 2023
Linear CNNs Discover the Statistical Structure of the Dataset Using Only the Most Dominant Frequencies
Hannah Pinson
Joeri Lenaerts
V. Ginis
62
3
0
03 Mar 2023
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks
Ye Li
Songcan Chen
Shengyi Huang
PINN
51
3
0
03 Mar 2023
M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast Self-Adaptation
Junjie Yang
Xuxi Chen
Tianlong Chen
Zhangyang Wang
Yitao Liang
65
3
0
28 Feb 2023
On the existence of minimizers in shallow residual ReLU neural network optimization landscapes
Steffen Dereich
Arnulf Jentzen
Sebastian Kassing
70
7
0
28 Feb 2023
Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation
Zhifa Ke
Junyu Zhang
Zaiwen Wen
72
0
0
25 Feb 2023
Learning to Generalize Provably in Learning to Optimize
Junjie Yang
Tianlong Chen
Mingkang Zhu
Fengxiang He
Dacheng Tao
Yitao Liang
Zhangyang Wang
79
7
0
22 Feb 2023
Some Fundamental Aspects about Lipschitz Continuity of Neural Networks
Grigory Khromov
Sidak Pal Singh
155
8
0
21 Feb 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron
Weihang Xu
S. Du
105
16
0
20 Feb 2023
The Expressive Power of Tuning Only the Normalization Layers
Angeliki Giannou
Shashank Rajput
Dimitris Papailiopoulos
66
8
0
15 Feb 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
Ming Wang
Sijia Liu
Pin-Yu Chen
ViT
MLT
138
64
0
12 Feb 2023
Generalization Ability of Wide Neural Networks on
R
\mathbb{R}
R
Jianfa Lai
Manyun Xu
Rui Chen
Qi-Rong Lin
87
23
0
12 Feb 2023
Effects of noise on the overparametrization of quantum neural networks
Diego García-Martín
Martín Larocca
M. Cerezo
85
18
0
10 Feb 2023
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training
Siddharth Singh
A. Bhatele
69
9
0
10 Feb 2023
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks
Shuai Zhang
Ming Wang
Pin-Yu Chen
Sijia Liu
Songtao Lu
Miaoyuan Liu
MLT
118
17
0
06 Feb 2023
Rethinking Gauss-Newton for learning over-parameterized models
Michael Arbel
Romain Menegaux
Pierre Wolinski
AI4CE
98
6
0
06 Feb 2023
AttNS: Attention-Inspired Numerical Solving For Limited Data Scenarios
Zhongzhan Huang
Mingfu Liang
Liang Lin
Liang Lin
88
5
0
05 Feb 2023
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning
François Caron
Fadhel Ayed
Paul Jung
Hoileong Lee
Juho Lee
Hongseok Yang
135
2
0
02 Feb 2023
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression
Mo Zhou
Rong Ge
121
2
0
01 Feb 2023
Gradient Descent in Neural Networks as Sequential Learning in RKBS
A. Shilton
Sunil R. Gupta
Santu Rana
Svetha Venkatesh
MLT
128
1
0
01 Feb 2023
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
Carlo Alfano
Rui Yuan
Patrick Rebeschini
145
15
0
30 Jan 2023
CyclicFL: A Cyclic Model Pre-Training Approach to Efficient Federated Learning
Peng Zhang
Yingbo Zhou
Ming Hu
Xin Fu
Xian Wei
Mingsong Chen
FedML
54
1
0
28 Jan 2023
A Simple Algorithm For Scaling Up Kernel Methods
Tengyu Xu
Bryan Kelly
Semyon Malamud
69
0
0
26 Jan 2023
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
Guihong Li
Yuedong Yang
Kartikeya Bhardwaj
R. Marculescu
121
63
0
26 Jan 2023
Limitations of Piecewise Linearity for Efficient Robustness Certification
Klas Leino
AAML
74
6
0
21 Jan 2023
Convergence beyond the over-parameterized regime using Rayleigh quotients
David A. R. Robin
Kevin Scaman
Marc Lelarge
60
3
0
19 Jan 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
128
13
0
30 Dec 2022
Bayesian Interpolation with Deep Linear Networks
Boris Hanin
Alexander Zlokapa
151
26
0
29 Dec 2022
Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks
Ilja Kuzborskij
Csaba Szepesvári
75
4
0
28 Dec 2022
COLT: Cyclic Overlapping Lottery Tickets for Faster Pruning of Convolutional Neural Networks
Md. Ismail Hossain
Mohammed Rakib
M. M. L. Elahi
Nabeel Mohammed
Shafin Rahman
145
1
0
24 Dec 2022
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
108
41
0
14 Dec 2022
Leveraging Unlabeled Data to Track Memorization
Mahsa Forouzesh
Hanie Sedghi
Patrick Thiran
NoLa
TDI
87
4
0
08 Dec 2022
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast Evasion of Non-Degenerate Saddle Points
Mayank Baranwal
Param Budhraja
V. Raj
A. Hota
67
3
0
07 Dec 2022
Improved Convergence Guarantees for Shallow Neural Networks
A. Razborov
ODL
66
1
0
05 Dec 2022
Infinite-width limit of deep linear neural networks
Lénaïc Chizat
Maria Colombo
Xavier Fernández-Real
Alessio Figalli
90
16
0
29 Nov 2022
On the Power of Foundation Models
Yang Yuan
106
38
0
29 Nov 2022
Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing
Josh Alman
Jiehao Liang
Zhao Song
Ruizhe Zhang
Danyang Zhuo
136
31
0
25 Nov 2022
Linear RNNs Provably Learn Linear Dynamic Systems
Lifu Wang
Tianyu Wang
Shengwei Yi
Bo Shen
Bo Hu
Xing Cao
53
0
0
19 Nov 2022
Mechanistic Mode Connectivity
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
118
49
0
15 Nov 2022
Characterizing the Spectrum of the NTK via a Power Series Expansion
Michael Murray
Hui Jin
Benjamin Bowman
Guido Montúfar
99
11
0
15 Nov 2022
Spectral Evolution and Invariance in Linear-width Neural Networks
Zhichao Wang
A. Engel
Anand D. Sarwate
Ioana Dumitriu
Tony Chiang
116
18
0
11 Nov 2022
Overparameterized random feature regression with nearly orthogonal data
Zhichao Wang
Yizhe Zhu
77
4
0
11 Nov 2022
Previous
1
2
3
4
5
6
...
16
17
18
Next