ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.06509
  4. Cited By
On the Optimization of Deep Networks: Implicit Acceleration by
  Overparameterization

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

19 February 2018
Sanjeev Arora
Nadav Cohen
Elad Hazan
ArXivPDFHTML

Papers citing "On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization"

19 / 119 papers shown
Title
Neural Empirical Bayes
Neural Empirical Bayes
Saeed Saremi
Aapo Hyvarinen
12
65
0
06 Mar 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with
  Structured Covariance Noise
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
ODL
20
22
0
21 Feb 2019
Understanding over-parameterized deep networks by geometrization
Understanding over-parameterized deep networks by geometrization
Xiao Dong
Ling Zhou
GNN
AI4CE
21
7
0
11 Feb 2019
Stiffness: A New Perspective on Generalization in Neural Networks
Stiffness: A New Perspective on Generalization in Neural Networks
Stanislav Fort
Pawel Krzysztof Nowak
Stanislaw Jastrzebski
S. Narayanan
21
94
0
28 Jan 2019
Width Provably Matters in Optimization for Deep Linear Neural Networks
Width Provably Matters in Optimization for Deep Linear Neural Networks
S. Du
Wei Hu
21
94
0
24 Jan 2019
Gradient Descent Happens in a Tiny Subspace
Gradient Descent Happens in a Tiny Subspace
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
30
228
0
12 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU
  Networks
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
33
446
0
21 Nov 2018
Regularization Matters: Generalization and Optimization of Neural Nets
  v.s. their Induced Kernel
Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
Colin Wei
J. Lee
Qiang Liu
Tengyu Ma
23
245
0
12 Oct 2018
A Convergence Analysis of Gradient Descent for Deep Linear Neural
  Networks
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
Sanjeev Arora
Nadav Cohen
Noah Golowich
Wei Hu
27
281
0
04 Oct 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
53
1,250
0
04 Oct 2018
Gradient descent aligns the layers of deep linear networks
Gradient descent aligns the layers of deep linear networks
Ziwei Ji
Matus Telgarsky
30
248
0
04 Oct 2018
Exponential Convergence Time of Gradient Descent for One-Dimensional
  Deep Linear Neural Networks
Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks
Ohad Shamir
35
45
0
23 Sep 2018
On the Learning Dynamics of Deep Neural Networks
On the Learning Dynamics of Deep Neural Networks
Rémi Tachet des Combes
Mohammad Pezeshki
Samira Shabanian
Aaron Courville
Yoshua Bengio
16
38
0
18 Sep 2018
Filter Distillation for Network Compression
Filter Distillation for Network Compression
Xavier Suau
Luca Zappella
N. Apostoloff
24
38
0
20 Jul 2018
ResNet with one-neuron hidden layers is a Universal Approximator
ResNet with one-neuron hidden layers is a Universal Approximator
Hongzhou Lin
Stefanie Jegelka
41
227
0
28 Jun 2018
Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex
Deep Neural Networks with Multi-Branch Architectures Are Less Non-Convex
Hongyang R. Zhang
Junru Shao
Ruslan Salakhutdinov
39
14
0
06 Jun 2018
High-dimensional dynamics of generalization error in neural networks
High-dimensional dynamics of generalization error in neural networks
Madhu S. Advani
Andrew M. Saxe
AI4CE
90
464
0
10 Oct 2017
A Differential Equation for Modeling Nesterov's Accelerated Gradient
  Method: Theory and Insights
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
Weijie Su
Stephen P. Boyd
Emmanuel J. Candes
108
1,154
0
04 Mar 2015
The Loss Surfaces of Multilayer Networks
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
183
1,185
0
30 Nov 2014
Previous
123