k-decay: A New Method For Learning Rate Schedule

13 April 2020

Abstract

Recent work has shown that optimize the learning rate schedule can be more accurate and more efficient to train the deep neural networks. In this paper, we put forward the k-decay method for the learning rate schedule, which impacts its k-order derivative to the rate of change of the learning rate. In the new learning rate schedule, used hyper-parameter \(k\) to control the degree of decay, the original is \(k = 1\). We derive the k-decay factors \(\frac{t^k}{T^k}\) for learning rate schedule and applied it to polynomial function, cosine function and exponential function. We evaluate the k-decay method by the new polynomial function on CIFAR-10 and CIFAR-100 datasets with different neural networks (ResNet, Wide ResNet and DenseNet). The k-decay method improvements over the state-of-the-art results on most of them. The accuracy can be improved by 1.08 \% on the CIFAR10 data set, and by 2.07 \% on the CIFAR100 data set. Our experiments show that accuracy improves with the increase of \(k\).

View on arXiv

Comments on this paper