Two-temperature logistic regression based on the Tsallis divergence
- NoLa

We develop a variant of multiclass logistic regression that achieves three properties: i) We minimize a non-convex surrogate loss which makes the method robust to outliers, ii) our method allows transitioning between non-convex and convex losses by the choice of the parameters, iii) the surrogate loss is Bayes consistent, even in the non-convex case. The algorithm has one weight vector per class and the surrogate loss is a function of the linear activations (one per class). The surrogate loss of an example with linear activation vector and class has the form where the two temperatures and "temper" the and , respectively, and is a generalization of the log-partition function. We motivate this loss using the Tsallis divergence. As the temperature of the logarithm becomes smaller than the temperature of the exponential, the surrogate loss becomes "more quasi-convex". Various tunings of the temperatures recover previous methods and tuning the degree of non-convexity is crucial in the experiments. The choice and performs best experimentally. We explain this by showing that caps the surrogate loss and makes the predictive distribution have a heavy tail.
View on arXiv