ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1902.08234
  4. Cited By
An Empirical Study of Large-Batch Stochastic Gradient Descent with
  Structured Covariance Noise

An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

21 February 2019
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
    ODL
ArXivPDFHTML

Papers citing "An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise"

47 / 47 papers shown
Title
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a
  Noisy Quadratic Model
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
Guodong Zhang
Lala Li
Zachary Nado
James Martens
Sushant Sachdeva
George E. Dahl
Christopher J. Shallue
Roger C. Grosse
93
153
0
09 Jul 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient
  Descent
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
Jaehoon Lee
Lechao Xiao
S. Schoenholz
Yasaman Bahri
Roman Novak
Jascha Narain Sohl-Dickstein
Jeffrey Pennington
211
1,104
0
18 Feb 2019
Measuring the Effects of Data Parallelism on Neural Network Training
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
82
409
0
08 Nov 2018
Three Mechanisms of Weight Decay Regularization
Three Mechanisms of Weight Decay Regularization
Guodong Zhang
Chaoqi Wang
Bowen Xu
Roger C. Grosse
62
258
0
29 Oct 2018
A Coordinate-Free Construction of Scalable Natural Gradient
A Coordinate-Free Construction of Scalable Natural Gradient
Kevin Luk
Roger C. Grosse
40
11
0
30 Aug 2018
A Surprising Linear Relationship Predicts Test Performance in Deep
  Networks
A Surprising Linear Relationship Predicts Test Performance in Deep Networks
Q. Liao
Brando Miranda
Andrzej Banburski
Jack Hidary
T. Poggio
50
32
0
25 Jul 2018
How Does Batch Normalization Help Optimization?
How Does Batch Normalization Help Optimization?
Shibani Santurkar
Dimitris Tsipras
Andrew Ilyas
Aleksander Madry
ODL
97
1,542
0
29 May 2018
Stability and Convergence Trade-off of Iterative Optimization Algorithms
Stability and Convergence Trade-off of Iterative Optimization Algorithms
Yuansi Chen
Chi Jin
Bin Yu
53
93
0
04 Apr 2018
Flipout: Efficient Pseudo-Independent Weight Perturbations on
  Mini-Batches
Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches
Yeming Wen
Paul Vicol
Jimmy Ba
Dustin Tran
Roger C. Grosse
BDL
55
312
0
12 Mar 2018
Understanding Short-Horizon Bias in Stochastic Meta-Optimization
Understanding Short-Horizon Bias in Stochastic Meta-Optimization
Yuhuai Wu
Mengye Ren
Renjie Liao
Roger C. Grosse
99
138
0
06 Mar 2018
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of
  Escaping from Sharp Minima and Regularization Effects
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Zhanxing Zhu
Jingfeng Wu
Ting Yu
Lei Wu
Jin Ma
36
40
0
01 Mar 2018
A Walk with SGD
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
89
119
0
24 Feb 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
73
409
0
22 Feb 2018
On the Optimization of Deep Networks: Implicit Acceleration by
  Overparameterization
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
Sanjeev Arora
Nadav Cohen
Elad Hazan
99
485
0
19 Feb 2018
Noisy Natural Gradient as Variational Inference
Noisy Natural Gradient as Variational Inference
Guodong Zhang
Shengyang Sun
David Duvenaud
Roger C. Grosse
ODL
72
212
0
06 Dec 2017
Three Factors Influencing Minima in SGD
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
76
463
0
13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
99
996
0
01 Nov 2017
Stochastic gradient descent performs variational inference, converges to
  limit cycles for deep networks
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari
Stefano Soatto
MLT
70
304
0
30 Oct 2017
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning
  Algorithms
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Han Xiao
Kashif Rasul
Roland Vollgraf
283
8,883
0
25 Aug 2017
Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical
  Viewpoints
Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints
Wenlong Mou
Liwei Wang
Xiyu Zhai
Kai Zheng
MLT
52
158
0
19 Jul 2017
Exploring Generalization in Deep Learning
Exploring Generalization in Deep Learning
Behnam Neyshabur
Srinadh Bhojanapalli
David A. McAllester
Nathan Srebro
FAtt
150
1,256
0
27 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
126
3,681
0
08 Jun 2017
Spectral Norm Regularization for Improving the Generalizability of Deep
  Learning
Spectral Norm Regularization for Improving the Generalizability of Deep Learning
Yuichi Yoshida
Takeru Miyato
79
334
0
31 May 2017
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
176
799
0
24 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
65
1,032
0
23 May 2017
Geometry of Optimization and Implicit Regularization in Deep Learning
Geometry of Optimization and Implicit Regularization in Deep Learning
Behnam Neyshabur
Ryota Tomioka
Ruslan Salakhutdinov
Nathan Srebro
AI4CE
65
133
0
08 May 2017
Stochastic Gradient Descent as Approximate Bayesian Inference
Stochastic Gradient Descent as Approximate Bayesian Inference
Stephan Mandt
Matthew D. Hoffman
David M. Blei
BDL
55
597
0
13 Apr 2017
How to Escape Saddle Points Efficiently
How to Escape Saddle Points Efficiently
Chi Jin
Rong Ge
Praneeth Netrapalli
Sham Kakade
Michael I. Jordan
ODL
224
836
0
02 Mar 2017
Non-convex learning via Stochastic Gradient Langevin Dynamics: a
  nonasymptotic analysis
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis
Maxim Raginsky
Alexander Rakhlin
Matus Telgarsky
73
521
0
13 Feb 2017
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
96
773
0
06 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
897
6,790
0
26 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
424
2,941
0
15 Sep 2016
Optimization Methods for Large-Scale Machine Learning
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
246
3,216
0
15 Jun 2016
A Kronecker-factored approximate Fisher matrix for convolution layers
A Kronecker-factored approximate Fisher matrix for convolution layers
Roger C. Grosse
James Martens
ODL
105
264
0
03 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,020
0
10 Dec 2015
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Dario Amodei
Rishita Anubhai
Eric Battenberg
Carl Case
Jared Casper
...
Chong-Jun Wang
Bo Xiao
Dani Yogatama
J. Zhan
Zhenyao Zhu
126
2,973
0
08 Dec 2015
Adding Gradient Noise Improves Learning for Very Deep Networks
Adding Gradient Noise Improves Learning for Very Deep Networks
Arvind Neelakantan
Luke Vilnis
Quoc V. Le
Ilya Sutskever
Lukasz Kaiser
Karol Kurach
James Martens
AI4CE
ODL
83
545
0
21 Nov 2015
Stochastic modified equations and adaptive stochastic gradient
  algorithms
Stochastic modified equations and adaptive stochastic gradient algorithms
Qianxiao Li
Cheng Tai
E. Weinan
59
284
0
19 Nov 2015
Efficient Per-Example Gradient Computations
Efficient Per-Example Gradient Computations
Ian Goodfellow
257
75
0
07 Oct 2015
Train faster, generalize better: Stability of stochastic gradient
  descent
Train faster, generalize better: Stability of stochastic gradient descent
Moritz Hardt
Benjamin Recht
Y. Singer
116
1,241
0
03 Sep 2015
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
Optimizing Neural Networks with Kronecker-factored Approximate Curvature
James Martens
Roger C. Grosse
ODL
101
1,014
0
19 Mar 2015
Escaping From Saddle Points --- Online Stochastic Gradient for Tensor
  Decomposition
Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition
Rong Ge
Furong Huang
Chi Jin
Yang Yuan
137
1,058
0
06 Mar 2015
New insights and perspectives on the natural gradient method
New insights and perspectives on the natural gradient method
James Martens
ODL
73
624
0
03 Dec 2014
The Loss Surfaces of Multilayer Networks
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
261
1,198
0
30 Nov 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.6K
100,386
0
04 Sep 2014
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly
  Convex Composite Objectives
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives
Aaron Defazio
Francis R. Bach
Simon Lacoste-Julien
ODL
133
1,826
0
01 Jul 2014
No More Pesky Learning Rates
No More Pesky Learning Rates
Tom Schaul
Sixin Zhang
Yann LeCun
137
478
0
06 Jun 2012
1