Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1802.06509
Cited By
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
19 February 2018
Sanjeev Arora
Nadav Cohen
Elad Hazan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization"
19 / 119 papers shown
Title
Neural Empirical Bayes
Saeed Saremi
Aapo Hyvarinen
12
65
0
06 Mar 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
ODL
20
22
0
21 Feb 2019
Understanding over-parameterized deep networks by geometrization
Xiao Dong
Ling Zhou
GNN
AI4CE
21
7
0
11 Feb 2019
Stiffness: A New Perspective on Generalization in Neural Networks
Stanislav Fort
Pawel Krzysztof Nowak
Stanislaw Jastrzebski
S. Narayanan
21
94
0
28 Jan 2019
Width Provably Matters in Optimization for Deep Linear Neural Networks
S. Du
Wei Hu
21
94
0
24 Jan 2019
Gradient Descent Happens in a Tiny Subspace
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
30
228
0
12 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
33
446
0
21 Nov 2018
Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
Colin Wei
J. Lee
Qiang Liu
Tengyu Ma
23
245
0
12 Oct 2018
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
Sanjeev Arora
Nadav Cohen
Noah Golowich
Wei Hu
27
281
0
04 Oct 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
53
1,250
0
04 Oct 2018
Gradient descent aligns the layers of deep linear networks
Ziwei Ji
Matus Telgarsky
30
248
0
04 Oct 2018
Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks
Ohad Shamir
35
45
0
23 Sep 2018
On the Learning Dynamics of Deep Neural Networks
Rémi Tachet des Combes
Mohammad Pezeshki
Samira Shabanian
Aaron Courville
Yoshua Bengio
16
38
0
18 Sep 2018
Filter Distillation for Network Compression
Xavier Suau
Luca Zappella
N. Apostoloff
24
38
0
20 Jul 2018
ResNet with one-neuron hidden layers is a Universal Approximator
Hongzhou Lin
Stefanie Jegelka
41
227
0
28 Jun 2018
Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex
Hongyang R. Zhang
Junru Shao
Ruslan Salakhutdinov
39
14
0
06 Jun 2018
High-dimensional dynamics of generalization error in neural networks
Madhu S. Advani
Andrew M. Saxe
AI4CE
90
464
0
10 Oct 2017
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
Weijie Su
Stephen P. Boyd
Emmanuel J. Candes
108
1,154
0
04 Mar 2015
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
183
1,185
0
30 Nov 2014
Previous
1
2
3