44
5

k-decay: A New Method For Learning Rate Schedule

Abstract

Recent work has shown that optimizing the learning rate (LR) schedule can be a very accurate and efficient way to train the deep neural networks. In this paper, we propose the k-decay method, in which the rate of change (ROC) of the LR is changed by its k-th order derivative, to obtain the new LR schedule. In the new LR schedule, a new hyper-parameter kk controls the change degree of LR, whereas the original method of kk at 1. By repeatedly using the k-decay method, one can identify the best LR schedule. We evaluate the k-decay method on CIFAR And ImageNet datasets with different neural networks (ResNet, Wide ResNet, and DenseNet). Our experiments show that the k-decay method can achieve improvements over the state-of-the-art results on most of them. The accuracy improved by 1.08% on the CIFAR-10 dataset, and by 2.07% on the CIFAR-100 dataset. On the ImageNet, accuracy improved by 1.25%. Our method is not only efficient but also easy to use.

View on arXiv
Comments on this paper