ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.02054
  4. Cited By
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
v1v2 (latest)

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

4 October 2018
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
    MLTODL
ArXiv (abs)PDFHTML

Papers citing "Gradient Descent Provably Optimizes Over-parameterized Neural Networks"

50 / 882 papers shown
Title
Pro-KD: Progressive Distillation by Following the Footsteps of the
  Teacher
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher
Mehdi Rezagholizadeh
A. Jafari
Puneeth Salad
Pranav Sharma
Ali Saheb Pasand
A. Ghodsi
143
18
0
16 Oct 2021
Provable Regret Bounds for Deep Online Learning and Control
Provable Regret Bounds for Deep Online Learning and Control
Xinyi Chen
Edgar Minasyan
Jason D. Lee
Elad Hazan
115
6
0
15 Oct 2021
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
121
105
0
13 Oct 2021
AIR-Net: Adaptive and Implicit Regularization Neural Network for Matrix
  Completion
AIR-Net: Adaptive and Implicit Regularization Neural Network for Matrix Completion
Zhemin Li
Tao Sun
Hongxia Wang
Bao Wang
88
6
0
12 Oct 2021
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
  on Pruned Neural Networks
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks
Shuai Zhang
Meng Wang
Sijia Liu
Pin-Yu Chen
Jinjun Xiong
UQCVMLT
85
13
0
12 Oct 2021
Imitating Deep Learning Dynamics via Locally Elastic Stochastic
  Differential Equations
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations
Jiayao Zhang
Hua Wang
Weijie J. Su
96
8
0
11 Oct 2021
Deep Bayesian inference for seismic imaging with tasks
Deep Bayesian inference for seismic imaging with tasks
Ali Siahkoohi
G. Rizzuti
Felix J. Herrmann
BDLUQCV
97
21
0
10 Oct 2021
Does Preprocessing Help Training Over-parameterized Neural Networks?
Does Preprocessing Help Training Over-parameterized Neural Networks?
Zhao Song
Shuo Yang
Ruizhe Zhang
98
50
0
09 Oct 2021
Distinguishing rule- and exemplar-based generalization in learning
  systems
Distinguishing rule- and exemplar-based generalization in learning systems
Ishita Dasgupta
Erin Grant
Thomas Griffiths
88
16
0
08 Oct 2021
New Insights into Graph Convolutional Networks using Neural Tangent
  Kernels
New Insights into Graph Convolutional Networks using Neural Tangent Kernels
Mahalakshmi Sabanayagam
Pascal Esser
Debarghya Ghoshdastidar
64
6
0
08 Oct 2021
Neural Tangent Kernel Empowered Federated Learning
Neural Tangent Kernel Empowered Federated Learning
Kai Yue
Richeng Jin
Ryan Pilgrim
Chau-Wai Wong
D. Baron
H. Dai
FedML
73
17
0
07 Oct 2021
On the Global Convergence of Gradient Descent for multi-layer ResNets in
  the mean-field regime
On the Global Convergence of Gradient Descent for multi-layer ResNets in the mean-field regime
Zhiyan Ding
Shi Chen
Qin Li
S. Wright
MLTAI4CE
95
11
0
06 Oct 2021
Efficient and Private Federated Learning with Partially Trainable
  Networks
Efficient and Private Federated Learning with Partially Trainable Networks
Hakim Sidahmed
Zheng Xu
Ankush Garg
Yuan Cao
Mingqing Chen
FedML
124
13
0
06 Oct 2021
Scale-invariant Learning by Physics Inversion
Scale-invariant Learning by Physics Inversion
Philipp Holl
V. Koltun
Nils Thuerey
PINNAI4CE
76
9
0
30 Sep 2021
On the Provable Generalization of Recurrent Neural Networks
On the Provable Generalization of Recurrent Neural Networks
Lifu Wang
Bo Shen
Bo Hu
Xing Cao
144
8
0
29 Sep 2021
The Role of Lookahead and Approximate Policy Evaluation in Reinforcement
  Learning with Linear Value Function Approximation
The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation
Anna Winnicki
Joseph Lubars
Michael Livesay
R. Srikant
74
3
0
28 Sep 2021
Theory of overparametrization in quantum neural networks
Theory of overparametrization in quantum neural networks
Martín Larocca
Nathan Ju
Diego García-Martín
Patrick J. Coles
M. Cerezo
103
192
0
23 Sep 2021
Deformed semicircle law and concentration of nonlinear random matrices
  for ultra-wide neural networks
Deformed semicircle law and concentration of nonlinear random matrices for ultra-wide neural networks
Zhichao Wang
Yizhe Zhu
107
20
0
20 Sep 2021
AdaLoss: A computationally-efficient and provably convergent adaptive
  gradient method
AdaLoss: A computationally-efficient and provably convergent adaptive gradient method
Xiaoxia Wu
Yuege Xie
S. Du
Rachel A. Ward
ODL
49
7
0
17 Sep 2021
Stationary Density Estimation of Itô Diffusions Using Deep Learning
Stationary Density Estimation of Itô Diffusions Using Deep Learning
Yiqi Gu
J. Harlim
Senwei Liang
Haizhao Yang
86
12
0
09 Sep 2021
NASI: Label- and Data-agnostic Neural Architecture Search at
  Initialization
NASI: Label- and Data-agnostic Neural Architecture Search at Initialization
Yao Shu
Shaofeng Cai
Zhongxiang Dai
Beng Chin Ooi
K. H. Low
98
44
0
02 Sep 2021
When and how epochwise double descent happens
When and how epochwise double descent happens
Cory Stephenson
Tyler Lee
82
15
0
26 Aug 2021
Understanding the Generalization of Adam in Learning Neural Networks
  with Proper Regularization
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLTAI4CE
113
44
0
25 Aug 2021
Fast Sketching of Polynomial Kernels of Polynomial Degree
Fast Sketching of Polynomial Kernels of Polynomial Degree
Zhao Song
David P. Woodruff
Zheng Yu
Lichen Zhang
82
41
0
21 Aug 2021
Existence, uniqueness, and convergence rates for gradient flows in the
  training of artificial neural networks with ReLU activation
Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation
Simon Eberle
Arnulf Jentzen
Adrian Riekert
G. Weiss
76
12
0
18 Aug 2021
Towards Understanding Theoretical Advantages of Complex-Reaction
  Networks
Towards Understanding Theoretical Advantages of Complex-Reaction Networks
Shao-Qun Zhang
Gaoxin Wei
Zhi Zhou
54
17
0
15 Aug 2021
A proof of convergence for the gradient descent optimization method with
  random initializations in the training of neural networks with ReLU
  activation for piecewise linear target functions
A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions
Arnulf Jentzen
Adrian Riekert
82
13
0
10 Aug 2021
Convergence of gradient descent for learning linear neural networks
Convergence of gradient descent for learning linear neural networks
Gabin Maxime Nguegnang
Holger Rauhut
Ulrich Terstiege
MLT
67
18
0
04 Aug 2021
Geometry of Linear Convolutional Networks
Geometry of Linear Convolutional Networks
Kathlén Kohn
Thomas Merkh
Guido Montúfar
Matthew Trager
117
20
0
03 Aug 2021
Towards General Function Approximation in Zero-Sum Markov Games
Towards General Function Approximation in Zero-Sum Markov Games
Baihe Huang
Jason D. Lee
Zhaoran Wang
Zhuoran Yang
88
47
0
30 Jul 2021
Deep Networks Provably Classify Data on Curves
Deep Networks Provably Classify Data on Curves
Tingran Wang
Sam Buchanan
D. Gilboa
John N. Wright
83
9
0
29 Jul 2021
Statistically Meaningful Approximation: a Case Study on Approximating
  Turing Machines with Transformers
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers
Colin Wei
Yining Chen
Tengyu Ma
79
92
0
28 Jul 2021
Stability & Generalisation of Gradient Descent for Shallow Neural
  Networks without the Neural Tangent Kernel
Stability & Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel
Dominic Richards
Ilja Kuzborskij
82
29
0
27 Jul 2021
SGD with a Constant Large Learning Rate Can Converge to Local Maxima
SGD with a Constant Large Learning Rate Can Converge to Local Maxima
Liu Ziyin
Botao Li
James B. Simon
Masakuni Ueda
104
9
0
25 Jul 2021
Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time
Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time
Yuyang Deng
Mohammad Mahdi Kamani
M. Mahdavi
FedML
68
14
0
22 Jul 2021
Efficient Algorithms for Learning Depth-2 Neural Networks with General
  ReLU Activations
Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations
Pranjal Awasthi
Alex K. Tang
Aravindan Vijayaraghavan
MLT
59
21
0
21 Jul 2021
Distribution of Classification Margins: Are All Data Equal?
Distribution of Classification Margins: Are All Data Equal?
Andrzej Banburski
Fernanda De La Torre
Nishka Pant
Ishana Shastri
T. Poggio
72
4
0
21 Jul 2021
Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping
Ilja Kuzborskij
Csaba Szepesvári
105
7
0
12 Jul 2021
Convergence analysis for gradient flows in the training of artificial
  neural networks with ReLU activation
Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation
Arnulf Jentzen
Adrian Riekert
55
23
0
09 Jul 2021
Rethinking Positional Encoding
Rethinking Positional Encoding
Jianqiao Zheng
Sameera Ramasinghe
Simon Lucey
85
52
0
06 Jul 2021
Partition and Code: learning how to compress graphs
Partition and Code: learning how to compress graphs
Giorgos Bouritsas
Andreas Loukas
Nikolaos Karalias
M. Bronstein
81
13
0
05 Jul 2021
Provable Convergence of Nesterov's Accelerated Gradient Method for
  Over-Parameterized Neural Networks
Provable Convergence of Nesterov's Accelerated Gradient Method for Over-Parameterized Neural Networks
Xin Liu
Zhisong Pan
Wei Tao
155
9
0
05 Jul 2021
A Theoretical Analysis of Fine-tuning with Linear Teachers
A Theoretical Analysis of Fine-tuning with Linear Teachers
Gal Shachaf
Alon Brutzkus
Amir Globerson
91
17
0
04 Jul 2021
Random Neural Networks in the Infinite Width Limit as Gaussian Processes
Random Neural Networks in the Infinite Width Limit as Gaussian Processes
Boris Hanin
BDL
100
48
0
04 Jul 2021
A Generalized Lottery Ticket Hypothesis
A Generalized Lottery Ticket Hypothesis
Ibrahim Alabdulmohsin
L. Markeeva
Daniel Keysers
Ilya O. Tolstikhin
69
6
0
03 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
104
268
0
01 Jul 2021
Fast Margin Maximization via Dual Acceleration
Fast Margin Maximization via Dual Acceleration
Ziwei Ji
Nathan Srebro
Matus Telgarsky
67
39
0
01 Jul 2021
Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization
  Training, Symmetry, and Sparsity
Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity
Arthur Jacot
François Ged
Berfin cSimcsek
Clément Hongler
Franck Gabriel
86
55
0
30 Jun 2021
A Non-parametric View of FedAvg and FedProx: Beyond Stationary Points
A Non-parametric View of FedAvg and FedProx: Beyond Stationary Points
Lili Su
Jiaming Xu
Pengkun Yang
FedML
85
13
0
29 Jun 2021
Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual
  Bandit
Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit
Yichi Zhou
Shihong Song
Huishuai Zhang
Jun Zhu
Wei Chen
Tie-Yan Liu
32
0
0
29 Jun 2021
Previous
123...8910...161718
Next