This paper concerns the construction of universal tests for binary hypothesis testing, in which the alternate hypothesis is poorly modeled and the observation space is large. The mismatched universal test is a feature-based technique for this purpose. In prior work it is shown that its finite-observation performance can be much better than the (optimal) Hoeffding test. However, good performance depends crucially on the choice of features. The contributions of this paper include: 1) We obtain bounds on the number of easily distinguishable distributions in an exponential family. 2) This motivates a new framework for feature extraction, cast as a rank-constrained optimization problem. 3) We obtain a gradient-based algorithm to solve the rank-constrained optimization problem and prove its local convergence. We demonstrate that it has a good performance in numerical experiments.
View on arXiv