Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.02054
Cited By
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
4 October 2018
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gradient Descent Provably Optimizes Over-parameterized Neural Networks"
50 / 302 papers shown
Title
CyclicFL: A Cyclic Model Pre-Training Approach to Efficient Federated Learning
Peng Zhang
Yingbo Zhou
Ming Hu
Xin Fu
Xian Wei
Mingsong Chen
FedML
32
1
0
28 Jan 2023
A Simple Algorithm For Scaling Up Kernel Methods
Tengyu Xu
Bryan Kelly
Semyon Malamud
16
0
0
26 Jan 2023
ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients
Guihong Li
Yuedong Yang
Kartikeya Bhardwaj
R. Marculescu
36
61
0
26 Jan 2023
Convergence beyond the over-parameterized regime using Rayleigh quotients
David A. R. Robin
Kevin Scaman
Marc Lelarge
27
3
0
19 Jan 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
53
11
0
30 Dec 2022
Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks
Ilja Kuzborskij
Csaba Szepesvári
21
4
0
28 Dec 2022
COLT: Cyclic Overlapping Lottery Tickets for Faster Pruning of Convolutional Neural Networks
Md. Ismail Hossain
Mohammed Rakib
M. M. L. Elahi
Nabeel Mohammed
Shafin Rahman
21
1
0
24 Dec 2022
Learning threshold neurons via the "edge of stability"
Kwangjun Ahn
Sébastien Bubeck
Sinho Chewi
Y. Lee
Felipe Suarez
Yi Zhang
MLT
38
36
0
14 Dec 2022
Leveraging Unlabeled Data to Track Memorization
Mahsa Forouzesh
Hanie Sedghi
Patrick Thiran
NoLa
TDI
34
4
0
08 Dec 2022
Generalized Gradient Flows with Provable Fixed-Time Convergence and Fast Evasion of Non-Degenerate Saddle Points
Mayank Baranwal
Param Budhraja
V. Raj
A. Hota
33
2
0
07 Dec 2022
Infinite-width limit of deep linear neural networks
Lénaïc Chizat
Maria Colombo
Xavier Fernández-Real
Alessio Figalli
31
14
0
29 Nov 2022
Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing
Josh Alman
Jiehao Liang
Zhao Song
Ruizhe Zhang
Danyang Zhuo
77
31
0
25 Nov 2022
Mechanistic Mode Connectivity
Ekdeep Singh Lubana
Eric J. Bigelow
Robert P. Dick
David M. Krueger
Hidenori Tanaka
32
45
0
15 Nov 2022
Characterizing the Spectrum of the NTK via a Power Series Expansion
Michael Murray
Hui Jin
Benjamin Bowman
Guido Montúfar
38
11
0
15 Nov 2022
A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks
Zhengdao Chen
Eric Vanden-Eijnden
Joan Bruna
MLT
27
5
0
28 Oct 2022
LOFT: Finding Lottery Tickets through Filter-wise Training
Qihan Wang
Chen Dun
Fangshuo Liao
C. Jermaine
Anastasios Kyrillidis
25
3
0
28 Oct 2022
Sparsity in Continuous-Depth Neural Networks
H. Aliee
Till Richter
Mikhail Solonin
I. Ibarra
Fabian J. Theis
Niki Kilbertus
29
10
0
26 Oct 2022
Global Convergence of SGD On Two Layer Neural Nets
Pulkit Gopalani
Anirbit Mukherjee
26
5
0
20 Oct 2022
Theoretical Guarantees for Permutation-Equivariant Quantum Neural Networks
Louis Schatzki
Martín Larocca
Quynh T. Nguyen
F. Sauvage
M. Cerezo
39
85
0
18 Oct 2022
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent
Satyen Kale
Jason D. Lee
Chris De Sa
Ayush Sekhari
Karthik Sridharan
27
4
0
13 Oct 2022
What Can the Neural Tangent Kernel Tell Us About Adversarial Robustness?
Nikolaos Tsilivis
Julia Kempe
AAML
47
17
0
11 Oct 2022
Efficient NTK using Dimensionality Reduction
Nir Ailon
Supratim Shit
36
0
0
10 Oct 2022
On skip connections and normalisation layers in deep optimisation
L. MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey
ODL
19
1
0
10 Oct 2022
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
40
2
0
05 Oct 2022
On the optimization and generalization of overparameterized implicit neural networks
Tianxiang Gao
Hongyang Gao
MLT
AI4CE
19
3
0
30 Sep 2022
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD
Alireza Mousavi-Hosseini
Sejun Park
M. Girotti
Ioannis Mitliagkas
Murat A. Erdogdu
MLT
324
48
0
29 Sep 2022
Lazy vs hasty: linearization in deep networks impacts learning schedule based on example difficulty
Thomas George
Guillaume Lajoie
A. Baratin
31
5
0
19 Sep 2022
Approximation results for Gradient Descent trained Shallow Neural Networks in
1
d
1d
1
d
R. Gentile
G. Welper
ODL
54
6
0
17 Sep 2022
Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization)
Zhenyu Zhu
Fanghui Liu
Grigorios G. Chrysos
V. Cevher
39
19
0
15 Sep 2022
Generalization Properties of NAS under Activation and Skip Connection Search
Zhenyu Zhu
Fanghui Liu
Grigorios G. Chrysos
V. Cevher
AI4CE
28
15
0
15 Sep 2022
Towards Understanding Mixture of Experts in Deep Learning
Zixiang Chen
Yihe Deng
Yue-bo Wu
Quanquan Gu
Yuan-Fang Li
MLT
MoE
30
53
0
04 Aug 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Z. Li
Zixuan Wang
Jian Li
19
44
0
26 Jul 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
39
123
0
18 Jul 2022
Efficient Augmentation for Imbalanced Deep Learning
Damien Dablain
C. Bellinger
Bartosz Krawczyk
Nitesh V. Chawla
32
7
0
13 Jul 2022
Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent
Zhiyuan Li
Tianhao Wang
Jason D. Lee
Sanjeev Arora
42
27
0
08 Jul 2022
Neural Networks can Learn Representations with Gradient Descent
Alexandru Damian
Jason D. Lee
Mahdi Soltanolkotabi
SSL
MLT
22
114
0
30 Jun 2022
Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis
Alexander Munteanu
Simon Omlor
Zhao Song
David P. Woodruff
30
15
0
26 Jun 2022
Learning sparse features can lead to overfitting in neural networks
Leonardo Petrini
Francesco Cagnetta
Eric Vanden-Eijnden
M. Wyart
MLT
42
23
0
24 Jun 2022
Large-width asymptotics for ReLU neural networks with
α
α
α
-Stable initializations
Stefano Favaro
S. Fortini
Stefano Peluchetti
20
2
0
16 Jun 2022
Why Quantization Improves Generalization: NTK of Binary Weight Neural Networks
Kaiqi Zhang
Ming Yin
Yu-Xiang Wang
MQ
24
4
0
13 Jun 2022
On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms
Lam M. Nguyen
Trang H. Tran
32
2
0
13 Jun 2022
Neural Collapse: A Review on Modelling Principles and Generalization
Vignesh Kothapalli
25
74
0
08 Jun 2022
Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials
Eshaan Nichani
Yunzhi Bai
Jason D. Lee
29
10
0
08 Jun 2022
Non-convex online learning via algorithmic equivalence
Udaya Ghai
Zhou Lu
Elad Hazan
14
8
0
30 May 2022
Global Convergence of Over-parameterized Deep Equilibrium Models
Zenan Ling
Xingyu Xie
Qiuhao Wang
Zongpeng Zhang
Zhouchen Lin
32
12
0
27 May 2022
Empirical Phase Diagram for Three-layer Neural Networks with Infinite Width
Hanxu Zhou
Qixuan Zhou
Zhenyuan Jin
Tao Luo
Yaoyu Zhang
Zhi-Qin John Xu
25
20
0
24 May 2022
Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture
Libin Zhu
Chaoyue Liu
M. Belkin
GNN
AI4CE
23
4
0
24 May 2022
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias
Itay Safran
Gal Vardi
Jason D. Lee
MLT
59
23
0
18 May 2022
Trading Positional Complexity vs. Deepness in Coordinate Networks
Jianqiao Zheng
Sameera Ramasinghe
Xueqian Li
Simon Lucey
28
18
0
18 May 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
42
121
0
03 May 2022
Previous
1
2
3
4
5
6
7
Next