Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.02054
Cited By
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
4 October 2018
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gradient Descent Provably Optimizes Over-parameterized Neural Networks"
50 / 302 papers shown
Title
A Theoretical Analysis of Fine-tuning with Linear Teachers
Gal Shachaf
Alon Brutzkus
Amir Globerson
34
17
0
04 Jul 2021
Random Neural Networks in the Infinite Width Limit as Gaussian Processes
Boris Hanin
BDL
32
43
0
04 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
36
259
0
01 Jul 2021
Locality defeats the curse of dimensionality in convolutional teacher-student scenarios
Alessandro Favero
Francesco Cagnetta
M. Wyart
30
31
0
16 Jun 2021
Fractal Structure and Generalization Properties of Stochastic Optimization Algorithms
A. Camuto
George Deligiannidis
Murat A. Erdogdu
Mert Gurbuzbalaban
Umut cSimcsekli
Lingjiong Zhu
33
29
0
09 Jun 2021
TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion
Saeed Soori
Bugra Can
Baourun Mu
Mert Gurbuzbalaban
M. Dehnavi
24
10
0
07 Jun 2021
Practical Convex Formulation of Robust One-hidden-layer Neural Network Training
Yatong Bai
Tanmay Gautam
Yujie Gai
Somayeh Sojoudi
AAML
27
3
0
25 May 2021
Global Convergence of Three-layer Neural Networks in the Mean Field Regime
H. Pham
Phan-Minh Nguyen
MLT
AI4CE
41
19
0
11 May 2021
FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Convergence Analysis
Baihe Huang
Xiaoxiao Li
Zhao Song
Xin Yang
FedML
31
16
0
11 May 2021
RATT: Leveraging Unlabeled Data to Guarantee Generalization
Saurabh Garg
Sivaraman Balakrishnan
J. Zico Kolter
Zachary Chase Lipton
30
30
0
01 May 2021
Generalization Guarantees for Neural Architecture Search with Train-Validation Split
Samet Oymak
Mingchen Li
Mahdi Soltanolkotabi
AI4CE
OOD
36
13
0
29 Apr 2021
PCFGs Can Do Better: Inducing Probabilistic Context-Free Grammars with Many Symbols
Aaron Courville
Yanpeng Zhao
Kewei Tu
23
22
0
28 Apr 2021
Understanding Overparameterization in Generative Adversarial Networks
Yogesh Balaji
M. Sajedi
Neha Kalibhat
Mucong Ding
Dominik Stöger
Mahdi Soltanolkotabi
S. Feizi
AI4CE
22
21
0
12 Apr 2021
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
Arnulf Jentzen
Adrian Riekert
MLT
34
13
0
01 Apr 2021
The Discovery of Dynamics via Linear Multistep Methods and Deep Learning: Error Estimation
Q. Du
Yiqi Gu
Haizhao Yang
Chao Zhou
26
20
0
21 Mar 2021
Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy
Lucas Liebenwein
Cenk Baykal
Brandon Carter
David K Gifford
Daniela Rus
AAML
40
71
0
04 Mar 2021
Experiments with Rich Regime Training for Deep Learning
Xinyan Li
A. Banerjee
32
2
0
26 Feb 2021
Learning with invariances in random features and kernel models
Song Mei
Theodor Misiakiewicz
Andrea Montanari
OOD
46
89
0
25 Feb 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases
Arnulf Jentzen
T. Kröger
ODL
28
7
0
23 Feb 2021
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
Cameron R. Wolfe
Jingkang Yang
Arindam Chowdhury
Chen Dun
Artun Bayer
Santiago Segarra
Anastasios Kyrillidis
BDL
GNN
LRM
54
9
0
20 Feb 2021
A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
Patrick Cheridito
Arnulf Jentzen
Adrian Riekert
Florian Rossmannek
28
24
0
19 Feb 2021
On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent
Shahar Azulay
E. Moroshko
Mor Shpigel Nacson
Blake E. Woodworth
Nathan Srebro
Amir Globerson
Daniel Soudry
AI4CE
33
73
0
19 Feb 2021
A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network
Mo Zhou
Rong Ge
Chi Jin
74
45
0
04 Feb 2021
On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths
Quynh N. Nguyen
43
48
0
24 Jan 2021
Reproducing Activation Function for Deep Learning
Senwei Liang
Liyao Lyu
Chunmei Wang
Haizhao Yang
36
21
0
13 Jan 2021
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks
Asaf Noy
Yi Tian Xu
Y. Aflalo
Lihi Zelnik-Manor
R. L. Jin
39
3
0
12 Jan 2021
Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks
Quynh N. Nguyen
Marco Mondelli
Guido Montúfar
25
81
0
21 Dec 2020
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
Zeyuan Allen-Zhu
Yuanzhi Li
FedML
60
355
0
17 Dec 2020
On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiers
E. Weinan
Stephan Wojtowytsch
36
43
0
10 Dec 2020
Neural collapse with unconstrained features
D. Mixon
Hans Parshall
Jianzong Pi
28
114
0
23 Nov 2020
Gradient Starvation: A Learning Proclivity in Neural Networks
Mohammad Pezeshki
Sekouba Kaba
Yoshua Bengio
Aaron Courville
Doina Precup
Guillaume Lajoie
MLT
50
257
0
18 Nov 2020
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee
Tengyu Xu
Yingbin Liang
Guanghui Lan
42
121
0
11 Nov 2020
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces
Zhuoran Yang
Chi Jin
Zhaoran Wang
Mengdi Wang
Michael I. Jordan
39
18
0
09 Nov 2020
Federated Knowledge Distillation
Hyowoon Seo
Jihong Park
Seungeun Oh
M. Bennis
Seong-Lyun Kim
FedML
31
91
0
04 Nov 2020
Are wider nets better given the same number of parameters?
A. Golubeva
Behnam Neyshabur
Guy Gur-Ari
27
44
0
27 Oct 2020
Neural Network Approximation: Three Hidden Layers Are Enough
Zuowei Shen
Haizhao Yang
Shijun Zhang
30
115
0
25 Oct 2020
A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks
Zhiqi Bu
Shiyun Xu
Kan Chen
33
17
0
25 Oct 2020
An Investigation of how Label Smoothing Affects Generalization
Blair Chen
Liu Ziyin
Zihao Wang
Paul Pu Liang
UQCV
21
17
0
23 Oct 2020
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime
Andrea Agazzi
Jianfeng Lu
13
15
0
22 Oct 2020
Deep Learning is Singular, and That's Good
Daniel Murfet
Susan Wei
Biwei Huang
Hui Li
Jesse Gell-Redman
T. Quella
UQCV
24
26
0
22 Oct 2020
Deep Reinforcement Learning for Adaptive Network Slicing in 5G for Intelligent Vehicular Systems and Smart Cities
A. Nassar
Y. Yilmaz
AI4CE
19
55
0
19 Oct 2020
A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix
T. Doan
Mehdi Abbana Bennani
Bogdan Mazoure
Guillaume Rabusseau
Pierre Alquier
CLL
20
80
0
07 Oct 2020
Computational Separation Between Convolutional and Fully-Connected Networks
Eran Malach
Shai Shalev-Shwartz
24
26
0
03 Oct 2020
On the linearity of large non-linear models: when and why the tangent kernel is constant
Chaoyue Liu
Libin Zhu
M. Belkin
21
140
0
02 Oct 2020
Neural Thompson Sampling
Weitong Zhang
Dongruo Zhou
Lihong Li
Quanquan Gu
28
114
0
02 Oct 2020
Deep Equals Shallow for ReLU Networks in Kernel Regimes
A. Bietti
Francis R. Bach
28
86
0
30 Sep 2020
Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't
E. Weinan
Chao Ma
Stephan Wojtowytsch
Lei Wu
AI4CE
22
133
0
22 Sep 2020
Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot
Jingtong Su
Yihang Chen
Tianle Cai
Tianhao Wu
Ruiqi Gao
Liwei Wang
J. Lee
14
85
0
22 Sep 2020
Tensor Programs III: Neural Matrix Laws
Greg Yang
14
43
0
22 Sep 2020
Previous
1
2
3
4
5
6
7
Next