P-Values for Classification

18 January 2008

Abstract

Let (X,Y) be a random variable consisting of a feature vector X and a class label Y in {1, 2, ..., L} with unknown conditional distributions P_b = L(X | Y = b). In addition let D be a training data set consisting of n independent copies of (X,Y). Usual classification procedures provide point predictors (classifiers) of Y or estimate posterior distributions of Y given X. In order to quantify the certainty of classifying X we propose to construct for each b = 1, 2, ..., L a nonparametric p-value pi_b(X,D) for the null hypothesis that Y = b, treating Y temporarily as a fixed parameter. In other words, point predictors are replaced with a prediction region for Y with given confidence level in a frequentist sense. We argue that this approach is advantegeous over Bayesian approaches and discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.

View on arXiv

Comments on this paper