Generalized Error Exponents for Sparse Sample Goodness of Fit Tests

We investigate the sparse sample goodness-of-fit problem, where the number of samples is smaller than the size of the alphabet . The goal of this work is to find an appropriate criterion to analyze statistical tests in this setting. A suitable model for analysis is the high-dimensional model in which both and tend to infinity, and . We propose a new performance criterion based on large deviation analysis, which generalizes the classical error exponent applicable for large sample problems (in which ). This new criterion provides insights that are not available from asymptotic consistency or CLT analysis. The main results are: (i) The best achievable probability of error decays as for some . (ii) A well-known coincidence-based test attains the optimal generalized error exponent. (iii) The widely used Pearson's chi-square test has J=0. (iv) The contributions (i)-(iii) are established under the assumption that the distribution under the null hypothesis is uniform. For the non-uniform case, a new test is proposed, with a non-zero generalized error exponent.
View on arXiv