Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2208.03941
Cited By
Provable Acceleration of Nesterov's Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks
8 August 2022
Xin Liu
Wei Tao
Wei Li
Dazhi Zhan
Jun Wang
Zhisong Pan
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Provable Acceleration of Nesterov's Accelerated Gradient Method over Heavy Ball Method in Training Over-Parameterized Neural Networks"
23 / 23 papers shown
Title
Parametric estimation of stochastic differential equations via online gradient descent
Shogo H. Nakakita
48
2
0
17 Oct 2022
Provable Convergence of Nesterov's Accelerated Gradient Method for Over-Parameterized Neural Networks
Xin Liu
Zhisong Pan
Wei Tao
129
9
0
05 Jul 2021
Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization
Peiyuan Zhang
Antonio Orvieto
Hadi Daneshmand
Thomas Hofmann
Roy S. Smith
43
9
0
23 Feb 2021
A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes
Zachary Nado
Justin M. Gilmer
Christopher J. Shallue
Rohan Anil
George E. Dahl
ODL
60
27
0
12 Feb 2021
On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths
Quynh N. Nguyen
90
48
0
24 Jan 2021
A Dynamical View on Optimization Algorithms of Overparameterized Neural Networks
Zhiqi Bu
Shiyun Xu
Kan Chen
54
18
0
25 Oct 2020
A Modular Analysis of Provable Acceleration via Polyak's Momentum: Training a Wide ReLU Network and a Deep Linear Network
Jun-Kun Wang
Chi-Heng Lin
Jacob D. Abernethy
28
23
0
04 Oct 2020
Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers
Robin M. Schmidt
Frank Schneider
Philipp Hennig
ODL
79
166
0
03 Jul 2020
The Recurrent Neural Tangent Kernel
Sina Alemohammad
Zichao Wang
Randall Balestriero
Richard Baraniuk
AAML
49
78
0
18 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
749
41,932
0
28 May 2020
Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? -- A Neural Tangent Kernel Perspective
Kaixuan Huang
Yuqing Wang
Molei Tao
T. Zhao
MLT
51
97
0
14 Feb 2020
Enhanced Convolutional Neural Tangent Kernels
Zhiyuan Li
Ruosong Wang
Dingli Yu
S. Du
Wei Hu
Ruslan Salakhutdinov
Sanjeev Arora
62
132
0
03 Nov 2019
Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound
Zhao Song
Xin Yang
60
91
0
09 Jun 2019
Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels
S. Du
Kangcheng Hou
Barnabás Póczós
Ruslan Salakhutdinov
Ruosong Wang
Keyulu Xu
130
276
0
30 May 2019
On Exact Computation with an Infinitely Wide Neural Net
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruslan Salakhutdinov
Ruosong Wang
218
923
0
26 Apr 2019
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
Sanjeev Arora
S. Du
Wei Hu
Zhiyuan Li
Ruosong Wang
MLT
195
972
0
24 Jan 2019
Gradient Descent Finds Global Minima of Deep Neural Networks
S. Du
Jason D. Lee
Haochuan Li
Liwei Wang
Masayoshi Tomizuka
ODL
192
1,135
0
09 Nov 2018
Understanding the Acceleration Phenomenon via High-Resolution Differential Equations
Bin Shi
S. Du
Michael I. Jordan
Weijie J. Su
53
259
0
21 Oct 2018
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
S. Du
Xiyu Zhai
Barnabás Póczós
Aarti Singh
MLT
ODL
214
1,272
0
04 Oct 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot
Franck Gabriel
Clément Hongler
267
3,195
0
20 Jun 2018
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao
Kashif Rasul
Roland Vollgraf
280
8,878
0
25 Aug 2017
A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights
Weijie Su
Stephen P. Boyd
Emmanuel J. Candes
160
1,166
0
04 Mar 2015
Deep Learning in Neural Networks: An Overview
Jürgen Schmidhuber
HAI
243
16,354
0
30 Apr 2014
1