62
5

k-decay: A New Method For Learning Rate Schedule

Abstract

Recent work has shown that optimize the learning rate schedule can be more accurate and more efficient to train the deep neural networks. In this paper, we put forward the k-decay method for the learning rate schedule, which impacts its k-order derivative to the rate of change of the learning rate. In the new learning rate schedule, used hyper-parameter \(k\) to control the degree of decay, the original is \(k = 1\). We derive the k-decay factors \(\frac{t^k}{T^k}\) for learning rate schedule and applied it to polynomial function, cosine function and exponential function. We evaluate the k-decay method by the new polynomial function on CIFAR-10 and CIFAR-100 datasets with different neural networks (ResNet, Wide ResNet and DenseNet). The k-decay method improvements over the state-of-the-art results on most of them. The accuracy can be improved by 1.08 \% on the CIFAR10 data set, and by 2.07 \% on the CIFAR100 data set. Our experiments show that accuracy improves with the increase of \(k\).

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.