10
0
v1v2 (latest)

AdamD: Improved bias-correction in Adam

Abstract

Here I present a small update to the bias-correction term in the Adam optimizer that has the advantage of making smaller gradient updates in the first several steps of training. With the default bias-correction, Adam may actually make larger than requested gradient updates early in training. By only including the well-justified bias-correction of the second moment gradient estimate, vtv_t, and excluding the bias-correction on the first-order estimate, mtm_t, we attain these more desirable gradient update properties in the first series of steps. The default implementation of Adam may be as sensitive as it is to the hyperparameters β1,β2\beta_1, \beta_2 partially due to the originally proposed bias correction procedure, and its behavior in early steps.

View on arXiv
Comments on this paper