ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.12065
  4. Cited By
On the Convergence Rate of Training Recurrent Neural Networks

On the Convergence Rate of Training Recurrent Neural Networks

29 October 2018
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
ArXivPDFHTML

Papers citing "On the Convergence Rate of Training Recurrent Neural Networks"

28 / 128 papers shown
Title
How Much Over-parameterization Is Sufficient to Learn Deep ReLU
  Networks?
How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?
Zixiang Chen
Yuan Cao
Difan Zou
Quanquan Gu
22
122
0
27 Nov 2019
Machine Learning for Prediction with Missing Dynamics
Machine Learning for Prediction with Missing Dynamics
J. Harlim
Shixiao W. Jiang
Senwei Liang
Haizhao Yang
AI4CE
14
60
0
13 Oct 2019
A Constructive Prediction of the Generalization Error Across Scales
A Constructive Prediction of the Generalization Error Across Scales
Jonathan S. Rosenfeld
Amir Rosenfeld
Yonatan Belinkov
Nir Shavit
36
205
0
27 Sep 2019
Polylogarithmic width suffices for gradient descent to achieve
  arbitrarily small test error with shallow ReLU networks
Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks
Ziwei Ji
Matus Telgarsky
27
177
0
26 Sep 2019
Mildly Overparametrized Neural Nets can Memorize Training Data
  Efficiently
Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently
Rong Ge
Runzhe Wang
Haoyu Zhao
TDI
18
20
0
26 Sep 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Xinyan Li
Qilong Gu
Yingxue Zhou
Tiancong Chen
A. Banerjee
ODL
42
51
0
24 Jul 2019
A Fine-Grained Spectral Perspective on Neural Networks
A Fine-Grained Spectral Perspective on Neural Networks
Greg Yang
Hadi Salman
30
110
0
24 Jul 2019
Towards Explaining the Regularization Effect of Initial Large Learning
  Rate in Training Neural Networks
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
Yuanzhi Li
Colin Wei
Tengyu Ma
6
291
0
10 Jul 2019
An Improved Analysis of Training Over-parameterized Deep Neural Networks
An Improved Analysis of Training Over-parameterized Deep Neural Networks
Difan Zou
Quanquan Gu
21
230
0
11 Jun 2019
Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound
Quadratic Suffices for Over-parametrization via Matrix Chernoff Bound
Zhao Song
Xin Yang
15
91
0
09 Jun 2019
Learning in Gated Neural Networks
Learning in Gated Neural Networks
Ashok Vardhan Makkuva
Sewoong Oh
Sreeram Kannan
Pramod Viswanath
11
10
0
06 Jun 2019
The Convergence Rate of Neural Networks for Learned Functions of
  Different Frequencies
The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
Ronen Basri
David Jacobs
Yoni Kasten
S. Kritchman
8
216
0
02 Jun 2019
Enhancing Adversarial Defense by k-Winners-Take-All
Enhancing Adversarial Defense by k-Winners-Take-All
Chang Xiao
Peilin Zhong
Changxi Zheng
AAML
24
97
0
25 May 2019
What Can ResNet Learn Efficiently, Going Beyond Kernels?
What Can ResNet Learn Efficiently, Going Beyond Kernels?
Zeyuan Allen-Zhu
Yuanzhi Li
24
183
0
24 May 2019
Gradient Descent can Learn Less Over-parameterized Two-layer Neural
  Networks on Classification Problems
Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems
Atsushi Nitanda
Geoffrey Chinot
Taiji Suzuki
MLT
16
33
0
23 May 2019
Stabilize Deep ResNet with A Sharp Scaling Factor $τ$
Stabilize Deep ResNet with A Sharp Scaling Factor τττ
Huishuai Zhang
Da Yu
Mingyang Yi
Wei Chen
Tie-Yan Liu
32
8
0
17 Mar 2019
Implicit Regularization in Over-parameterized Neural Networks
Implicit Regularization in Over-parameterized Neural Networks
M. Kubo
Ryotaro Banno
Hidetaka Manabe
Masataka Minoji
19
23
0
05 Mar 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient
  Descent
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Jaehoon Lee
Lechao Xiao
S. Schoenholz
Yasaman Bahri
Roman Novak
Jascha Narain Sohl-Dickstein
Jeffrey Pennington
34
1,077
0
18 Feb 2019
Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian
  Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation
Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation
Greg Yang
11
284
0
13 Feb 2019
Generalization Error Bounds of Gradient Descent for Learning
  Over-parameterized Deep ReLU Networks
Generalization Error Bounds of Gradient Descent for Learning Over-parameterized Deep ReLU Networks
Yuan Cao
Quanquan Gu
ODL
MLT
AI4CE
25
155
0
04 Feb 2019
Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Zeyuan Allen-Zhu
Yuanzhi Li
MLT
LRM
11
57
0
04 Feb 2019
Towards a Theoretical Understanding of Hashing-Based Neural Nets
Towards a Theoretical Understanding of Hashing-Based Neural Nets
Yibo Lin
Zhao Song
Lin F. Yang
14
5
0
26 Dec 2018
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU
  Networks
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
33
446
0
21 Nov 2018
Learning and Generalization in Overparameterized Neural Networks, Going
  Beyond Two Layers
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Zeyuan Allen-Zhu
Yuanzhi Li
Yingyu Liang
MLT
32
765
0
12 Nov 2018
A Convergence Theory for Deep Learning via Over-Parameterization
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
AI4CE
ODL
42
1,448
0
09 Nov 2018
Gradient Descent Finds Global Minima of Deep Neural Networks
Gradient Descent Finds Global Minima of Deep Neural Networks
S. Du
J. Lee
Haochuan Li
Liwei Wang
Masayoshi Tomizuka
ODL
44
1,125
0
09 Nov 2018
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train
  10,000-Layer Vanilla Convolutional Neural Networks
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
244
349
0
14 Jun 2018
Spectral Filtering for General Linear Dynamical Systems
Spectral Filtering for General Linear Dynamical Systems
Elad Hazan
Holden Lee
Karan Singh
Cyril Zhang
Yi Zhang
45
97
0
12 Feb 2018
Previous
123