Optimally Efficient Sequential Calibration of Binary Classifiers to Minimize Classification Error

In this work, we aim to calibrate the score outputs of an estimator for the binary classification problem by finding an óptimal' mapping to class probabilities, where the óptimal' mapping is in the sense that minimizes the classification error (or equivalently, maximizes the accuracy). We show that for the given target variables and the score outputs of an estimator, an óptimal' soft mapping, which monotonically maps the score values to probabilities, is a hard mapping that maps the score values to and . We show that for class weighted (where the accuracy for one class is more important) and sample weighted (where the samples' accurate classifications are not equally important) errors, or even general linear losses; this hard mapping characteristic is preserved. We propose a sequential recursive merger approach, which produces an óptimal' hard mapping (for the observed samples so far) sequentially with each incoming new sample. Our approach has a logarithmic in sample size time complexity, which is optimally efficient.
View on arXiv