When many labels are possible, choosing a single one can lead to low precision. A common alternative, referred to as top- classification, is to choose some number (commonly around 5) and to return the labels with the highest scores. Unfortunately, for unambiguous cases, is too many and, for very ambiguous cases, (for example) can be too small. An alternative sensible strategy is to use an adaptive approach in which the number of labels returned varies as a function of the computed ambiguity, but must average to some particular over all the samples. We denote this alternative average- classification. This paper formally characterizes the ambiguity profile when average- classification can achieve a lower error rate than a fixed top- classification. Moreover, it provides natural estimation procedures for both the fixed-size and the adaptive classifier and proves their consistency. Finally, it reports experiments on real-world image data sets revealing the benefit of average- classification over top- in practice. Overall, when the ambiguity is known precisely, average- is never worse than top-, and, in our experiments, when it is estimated, this also holds.
View on arXiv