k-decay: A New Method For Learning Rate Schedule

13 April 2020

Abstract

Recent work has shown that optimizing the learning rate (LR) schedule can be a more accurate and more efficient to train the deep neural networks. In this paper, we propose the k-decay method, in which the LR is changed by its k-th order derivative, to obtain the new LR schedule. In the new LR schedule, a hyper-parameter $k$ is used to control the change degree of LR, whereas the original method of $k$ at 1. We then derive the k-decay factors $\frac{t^k}{T^k}$ for LR schedule and apply it to polynomial, cosine and exponential function. By using the k-decay method, one can identify the best LR schedule. We evaluate the k-decay method on CIFAR And ImageNet datasets with different neural networks (ResNet, Wide ResNet and DenseNet). Our experiments show that the k-decay method can achieve improvements over the state-of-the-art results on most of them. The accuracy can be improved by 1.08 % on the CIFAR10 data set, and by 2.07% on the CIFAR100 data set. On the ImageNet, the accuracy can be improved by 1.25%. And this method no added additional computing power can be improved the training performance and easy to used.

View on arXiv

Comments on this paper