71
26

Two-temperature logistic regression based on the Tsallis divergence

Abstract

We develop a variant of multiclass logistic regression that achieves three properties: i) We minimize a non-convex surrogate loss which makes the method robust to outliers, ii) our method allows transitioning between non-convex and convex losses by the choice of the parameters, iii) the surrogate loss is Bayes consistent, even in the non-convex case. The algorithm has one weight vector per class and the surrogate loss is a function of the linear activations (one per class). The surrogate loss of an example with linear activation vector a\mathbf{a} and class cc has the form logt1expt2(acGt2(a))-\log_{t_1} \exp_{t_2} (a_c - G_{t_2}(\mathbf{a})) where the two temperatures t1t_1 and t2t_2 "temper" the log\log and exp\exp, respectively, and Gt2G_{t_2} is a generalization of the log-partition function. We motivate this loss using the Tsallis divergence. As the temperature of the logarithm becomes smaller than the temperature of the exponential, the surrogate loss becomes "more quasi-convex". Various tunings of the temperatures recover previous methods and tuning the degree of non-convexity is crucial in the experiments. The choice t1<1t_1<1 and t2>1t_2>1 performs best experimentally. We explain this by showing that t1<1t_1 < 1 caps the surrogate loss and t2>1t_2 >1 makes the predictive distribution have a heavy tail.

View on arXiv
Comments on this paper