ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.02054
  4. Cited By
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
v1v2 (latest)

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

4 October 2018
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
    MLTODL
ArXiv (abs)PDFHTML

Papers citing "Gradient Descent Provably Optimizes Over-parameterized Neural Networks"

50 / 882 papers shown
Title
Wide neural networks: From non-gaussian random fields at initialization
  to the NTK geometry of training
Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training
Luís Carvalho
Joao L. Costa
José Mourao
Gonccalo Oliveira
AI4CE
62
2
0
06 Apr 2023
Learning with augmented target information: An alternative theory of
  Feedback Alignment
Learning with augmented target information: An alternative theory of Feedback Alignment
Huzi Cheng
Joshua W. Brown
CVBM
65
0
0
03 Apr 2023
Depth Separation with Multilayer Mean-Field Networks
Depth Separation with Multilayer Mean-Field Networks
Y. Ren
Mo Zhou
Rong Ge
OOD
85
3
0
03 Apr 2023
Fast Convergence of Random Reshuffling under Over-Parameterization and
  the Polyak-Łojasiewicz Condition
Fast Convergence of Random Reshuffling under Over-Parameterization and the Polyak-Łojasiewicz Condition
Chen Fan
Christos Thrampoulidis
Mark Schmidt
58
2
0
02 Apr 2023
An Over-parameterized Exponential Regression
An Over-parameterized Exponential Regression
Yeqi Gao
Sridhar Mahadevan
Zhao Song
81
39
0
29 Mar 2023
Kernel interpolation generalizes poorly
Kernel interpolation generalizes poorly
Yicheng Li
Haobo Zhang
Qian Lin
78
11
0
28 Mar 2023
Convergence Guarantees of Overparametrized Wide Deep Inverse Prior
Convergence Guarantees of Overparametrized Wide Deep Inverse Prior
Nathan Buskulic
Yvain Quéau
M. Fadili
BDL
71
2
0
20 Mar 2023
Learning Fractals by Gradient Descent
Learning Fractals by Gradient Descent
Cheng-Hao Tu
Hong-You Chen
David Carlyn
Wei-Lun Chao
57
3
0
14 Mar 2023
Phase Diagram of Initial Condensation for Two-layer Neural Networks
Phase Diagram of Initial Condensation for Two-layer Neural Networks
Zheng Chen
Yuqing Li
Yaoyu Zhang
Zhaoguang Zhou
Z. Xu
MLTAI4CE
104
11
0
12 Mar 2023
Linear CNNs Discover the Statistical Structure of the Dataset Using Only
  the Most Dominant Frequencies
Linear CNNs Discover the Statistical Structure of the Dataset Using Only the Most Dominant Frequencies
Hannah Pinson
Joeri Lenaerts
V. Ginis
62
3
0
03 Mar 2023
Implicit Stochastic Gradient Descent for Training Physics-informed
  Neural Networks
Implicit Stochastic Gradient Descent for Training Physics-informed Neural Networks
Ye Li
Songcan Chen
Shengyi Huang
PINN
51
3
0
03 Mar 2023
M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast
  Self-Adaptation
M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast Self-Adaptation
Junjie Yang
Xuxi Chen
Tianlong Chen
Zhangyang Wang
Yitao Liang
65
3
0
28 Feb 2023
On the existence of minimizers in shallow residual ReLU neural network
  optimization landscapes
On the existence of minimizers in shallow residual ReLU neural network optimization landscapes
Steffen Dereich
Arnulf Jentzen
Sebastian Kassing
70
7
0
28 Feb 2023
Gauss-Newton Temporal Difference Learning with Nonlinear Function
  Approximation
Gauss-Newton Temporal Difference Learning with Nonlinear Function Approximation
Zhifa Ke
Junyu Zhang
Zaiwen Wen
72
0
0
25 Feb 2023
Learning to Generalize Provably in Learning to Optimize
Learning to Generalize Provably in Learning to Optimize
Junjie Yang
Tianlong Chen
Mingkang Zhu
Fengxiang He
Dacheng Tao
Yitao Liang
Zhangyang Wang
79
7
0
22 Feb 2023
Some Fundamental Aspects about Lipschitz Continuity of Neural Networks
Some Fundamental Aspects about Lipschitz Continuity of Neural Networks
Grigory Khromov
Sidak Pal Singh
155
8
0
21 Feb 2023
Over-Parameterization Exponentially Slows Down Gradient Descent for
  Learning a Single Neuron
Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron
Weihang Xu
S. Du
105
16
0
20 Feb 2023
The Expressive Power of Tuning Only the Normalization Layers
The Expressive Power of Tuning Only the Normalization Layers
Angeliki Giannou
Shashank Rajput
Dimitris Papailiopoulos
66
8
0
15 Feb 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning,
  Generalization, and Sample Complexity
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
Ming Wang
Sijia Liu
Pin-Yu Chen
ViTMLT
138
64
0
12 Feb 2023
Generalization Ability of Wide Neural Networks on $\mathbb{R}$
Generalization Ability of Wide Neural Networks on R\mathbb{R}R
Jianfa Lai
Manyun Xu
Rui Chen
Qi-Rong Lin
87
23
0
12 Feb 2023
Effects of noise on the overparametrization of quantum neural networks
Effects of noise on the overparametrization of quantum neural networks
Diego García-Martín
Martín Larocca
M. Cerezo
85
18
0
10 Feb 2023
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model
  Training
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training
Siddharth Singh
A. Bhatele
69
9
0
10 Feb 2023
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural
  Networks
Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks
Shuai Zhang
Ming Wang
Pin-Yu Chen
Sijia Liu
Songtao Lu
Miaoyuan Liu
MLT
118
17
0
06 Feb 2023
Rethinking Gauss-Newton for learning over-parameterized models
Rethinking Gauss-Newton for learning over-parameterized models
Michael Arbel
Romain Menegaux
Pierre Wolinski
AI4CE
98
6
0
06 Feb 2023
AttNS: Attention-Inspired Numerical Solving For Limited Data Scenarios
AttNS: Attention-Inspired Numerical Solving For Limited Data Scenarios
Zhongzhan Huang
Mingfu Liang
Liang Lin
Liang Lin
88
5
0
05 Feb 2023
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning
Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning
François Caron
Fadhel Ayed
Paul Jung
Hoileong Lee
Juho Lee
Hongseok Yang
135
2
0
02 Feb 2023
Implicit Regularization Leads to Benign Overfitting for Sparse Linear
  Regression
Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression
Mo Zhou
Rong Ge
121
2
0
01 Feb 2023
Gradient Descent in Neural Networks as Sequential Learning in RKBS
Gradient Descent in Neural Networks as Sequential Learning in RKBS
A. Shilton
Sunil R. Gupta
Santu Rana
Svetha Venkatesh
MLT
128
1
0
01 Feb 2023
A Novel Framework for Policy Mirror Descent with General
  Parameterization and Linear Convergence
A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence
Carlo Alfano
Rui Yuan
Patrick Rebeschini
145
15
0
30 Jan 2023
CyclicFL: A Cyclic Model Pre-Training Approach to Efficient Federated
  Learning
CyclicFL: A Cyclic Model Pre-Training Approach to Efficient Federated Learning
Peng Zhang
Yingbo Zhou
Ming Hu
Xin Fu
Xian Wei
Mingsong Chen
FedML
54
1
0
28 Jan 2023
A Simple Algorithm For Scaling Up Kernel Methods
A Simple Algorithm For Scaling Up Kernel Methods
Tengyu Xu
Bryan Kelly
Semyon Malamud
69
0
0
26 Jan 2023
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
Guihong Li
Yuedong Yang
Kartikeya Bhardwaj
R. Marculescu
121
63
0
26 Jan 2023
Limitations of Piecewise Linearity for Efficient Robustness
  Certification
Limitations of Piecewise Linearity for Efficient Robustness Certification
Klas Leino
AAML
74
6
0
21 Jan 2023
Convergence beyond the over-parameterized regime using Rayleigh
  quotients
Convergence beyond the over-parameterized regime using Rayleigh quotients
David A. R. Robin
Kevin Scaman
Marc Lelarge
60
3
0
19 Jan 2023
An Analysis of Attention via the Lens of Exchangeability and Latent
  Variable Models
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
128
13
0
30 Dec 2022
Bayesian Interpolation with Deep Linear Networks
Bayesian Interpolation with Deep Linear Networks
Boris Hanin
Alexander Zlokapa
151
26
0
29 Dec 2022
Learning Lipschitz Functions by GD-trained Shallow Overparameterized
  ReLU Neural Networks
Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks
Ilja Kuzborskij
Csaba Szepesvári
75
4
0
28 Dec 2022
COLT: Cyclic Overlapping Lottery Tickets for Faster Pruning of Convolutional Neural Networks
COLT: Cyclic Overlapping Lottery Tickets for Faster Pruning of Convolutional Neural Networks
Md. Ismail Hossain
Mohammed Rakib
M. M. L. Elahi
Nabeel Mohammed
Shafin Rahman
145
1
0
24 Dec 2022
Learning threshold neurons via the "edge of stability"
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
108
41
0
14 Dec 2022
Leveraging Unlabeled Data to Track Memorization
Leveraging Unlabeled Data to Track Memorization
Mahsa Forouzesh
Hanie Sedghi
Patrick Thiran
NoLaTDI
87
4
0
08 Dec 2022
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast
  Evasion of Non-Degenerate Saddle Points
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast Evasion of Non-Degenerate Saddle Points
Mayank Baranwal
Param Budhraja
V. Raj
A. Hota
67
3
0
07 Dec 2022
Improved Convergence Guarantees for Shallow Neural Networks
Improved Convergence Guarantees for Shallow Neural Networks
A. Razborov
ODL
66
1
0
05 Dec 2022
Infinite-width limit of deep linear neural networks
Infinite-width limit of deep linear neural networks
Lénaïc Chizat
Maria Colombo
Xavier Fernández-Real
Alessio Figalli
90
16
0
29 Nov 2022
On the Power of Foundation Models
On the Power of Foundation Models
Yang Yuan
106
38
0
29 Nov 2022
Bypass Exponential Time Preprocessing: Fast Neural Network Training via
  Weight-Data Correlation Preprocessing
Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing
Josh Alman
Jiehao Liang
Zhao Song
Ruizhe Zhang
Danyang Zhuo
136
31
0
25 Nov 2022
Linear RNNs Provably Learn Linear Dynamic Systems
Linear RNNs Provably Learn Linear Dynamic Systems
Lifu Wang
Tianyu Wang
Shengwei Yi
Bo Shen
Bo Hu
Xing Cao
53
0
0
19 Nov 2022
Mechanistic Mode Connectivity
Mechanistic Mode Connectivity
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
118
49
0
15 Nov 2022
Characterizing the Spectrum of the NTK via a Power Series Expansion
Characterizing the Spectrum of the NTK via a Power Series Expansion
Michael Murray
Hui Jin
Benjamin Bowman
Guido Montúfar
99
11
0
15 Nov 2022
Spectral Evolution and Invariance in Linear-width Neural Networks
Spectral Evolution and Invariance in Linear-width Neural Networks
Zhichao Wang
A. Engel
Anand D. Sarwate
Ioana Dumitriu
Tony Chiang
116
18
0
11 Nov 2022
Overparameterized random feature regression with nearly orthogonal data
Overparameterized random feature regression with nearly orthogonal data
Zhichao Wang
Yizhe Zhu
77
4
0
11 Nov 2022
Previous
123456...161718
Next