Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1907.04595
Cited By
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks
10 July 2019
Yuanzhi Li
Colin Wei
Tengyu Ma
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks"
22 / 72 papers shown
Title
A Random Matrix Theory Approach to Damping in Deep Learning
Diego Granziol
Nicholas P. Baskerville
AI4CE
ODL
29
2
0
15 Nov 2020
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate
Jingfeng Wu
Difan Zou
Vladimir Braverman
Quanquan Gu
18
18
0
04 Nov 2020
Effective Regularization Through Loss-Function Metalearning
Santiago Gonzalez
Risto Miikkulainen
32
5
0
02 Oct 2020
Learning Optimal Representations with the Decodable Information Bottleneck
Yann Dubois
Douwe Kiela
D. Schwab
Ramakrishna Vedantam
28
43
0
27 Sep 2020
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks
Keyulu Xu
Mozhi Zhang
Jingling Li
S. Du
Ken-ichi Kawarabayashi
Stefanie Jegelka
MLT
25
306
0
24 Sep 2020
Deep Networks and the Multiple Manifold Problem
Sam Buchanan
D. Gilboa
John N. Wright
166
39
0
25 Aug 2020
Boundary thickness and robustness in learning models
Yaoqing Yang
Rekha Khanna
Yaodong Yu
A. Gholami
Kurt Keutzer
Joseph E. Gonzalez
Kannan Ramchandran
Michael W. Mahoney
OOD
18
37
0
09 Jul 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
37
49
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
Jason D. Lee
Tengyu Ma
32
94
0
15 Jun 2020
Feature Purification: How Adversarial Training Performs Robust Deep Learning
Zeyuan Allen-Zhu
Yuanzhi Li
MLT
AAML
39
147
0
20 May 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
39
21
0
15 May 2020
On Learning Rates and Schrödinger Operators
Bin Shi
Weijie J. Su
Michael I. Jordan
24
60
0
15 Apr 2020
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
236
0
04 Mar 2020
Iterative Averaging in the Quest for Best Test Error
Diego Granziol
Xingchen Wan
Samuel Albanie
Stephen J. Roberts
10
3
0
02 Mar 2020
The Two Regimes of Deep Network Training
Guillaume Leclerc
A. Madry
24
45
0
24 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
55
155
0
21 Feb 2020
Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks
Yu Bai
Jason D. Lee
24
116
0
03 Oct 2019
Finite Depth and Width Corrections to the Neural Tangent Kernel
Boris Hanin
Mihai Nica
MDE
30
149
0
13 Sep 2019
How Does Learning Rate Decay Help Modern Neural Networks?
Kaichao You
Mingsheng Long
Jianmin Wang
Michael I. Jordan
30
4
0
05 Aug 2019
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Zeyuan Allen-Zhu
Yuanzhi Li
Yingyu Liang
MLT
41
765
0
12 Nov 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
310
2,896
0
15 Sep 2016
Quantifying the probable approximation error of probabilistic inference programs
Marco F. Cusumano-Towner
Vikash K. Mansinghka
33
7
0
31 May 2016
Previous
1
2