ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.12581
  4. Cited By
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight
  Averaging for Better Generalization
v1v2v3 (latest)

Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization

26 September 2022
Gábor Melis
    MoMe
ArXiv (abs)PDFHTML

Papers citing "Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization"

12 / 12 papers shown
Title
Stochastic Weight Averaging Revisited
Stochastic Weight Averaging Revisited
Hao Guo
Jiyong Jin
B. Liu
63
30
0
03 Jan 2022
An Analysis of Constant Step Size SGD in the Non-convex Regime:
  Asymptotic Normality and Bias
An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias
Lu Yu
Krishnakumar Balasubramanian
S. Volgushev
Murat A. Erdogdu
94
52
0
14 Jun 2020
On the Variance of the Adaptive Learning Rate and Beyond
On the Variance of the Adaptive Learning Rate and Beyond
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
ODL
289
1,907
0
08 Aug 2019
Anytime Tail Averaging
Anytime Tail Averaging
Nicolas Le Roux
MoMe
22
5
0
13 Feb 2019
Averaging Weights Leads to Wider Optima and Better Generalization
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedMLMoMe
137
1,670
0
14 Mar 2018
Regularizing and Optimizing LSTM Language Models
Regularizing and Optimizing LSTM Language Models
Stephen Merity
N. Keskar
R. Socher
173
1,096
0
07 Aug 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
429
2,945
0
15 Sep 2016
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.0K
150,312
0
22 Dec 2014
New insights and perspectives on the natural gradient method
New insights and perspectives on the natural gradient method
James Martens
ODL
76
630
0
03 Dec 2014
A simpler approach to obtaining an O(1/t) convergence rate for the
  projected stochastic subgradient method
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
Simon Lacoste-Julien
Mark Schmidt
Francis R. Bach
187
261
0
10 Dec 2012
Stochastic Gradient Descent for Non-smooth Optimization: Convergence
  Results and Optimal Averaging Schemes
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Ohad Shamir
Tong Zhang
160
576
0
08 Dec 2012
Making Gradient Descent Optimal for Strongly Convex Stochastic
  Optimization
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Alexander Rakhlin
Ohad Shamir
Karthik Sridharan
176
768
0
26 Sep 2011
1