Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.12581
Cited By
v1
v2
v3 (latest)
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization
26 September 2022
Gábor Melis
MoMe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization"
12 / 12 papers shown
Title
Stochastic Weight Averaging Revisited
Hao Guo
Jiyong Jin
B. Liu
63
30
0
03 Jan 2022
An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias
Lu Yu
Krishnakumar Balasubramanian
S. Volgushev
Murat A. Erdogdu
94
52
0
14 Jun 2020
On the Variance of the Adaptive Learning Rate and Beyond
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
ODL
289
1,907
0
08 Aug 2019
Anytime Tail Averaging
Nicolas Le Roux
MoMe
22
5
0
13 Feb 2019
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedML
MoMe
137
1,670
0
14 Mar 2018
Regularizing and Optimizing LSTM Language Models
Stephen Merity
N. Keskar
R. Socher
173
1,096
0
07 Aug 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
429
2,945
0
15 Sep 2016
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.0K
150,312
0
22 Dec 2014
New insights and perspectives on the natural gradient method
James Martens
ODL
76
630
0
03 Dec 2014
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
Simon Lacoste-Julien
Mark Schmidt
Francis R. Bach
187
261
0
10 Dec 2012
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Ohad Shamir
Tong Zhang
160
576
0
08 Dec 2012
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Alexander Rakhlin
Ohad Shamir
Karthik Sridharan
176
768
0
26 Sep 2011
1