We study an outlying sequence detection problem, in which there are sequences of samples out of which a small subset of outliers need to be detected. A sequence is considered as an outlier if the observations therein are generated by a distribution different from those generating the observations in the majority of the sequences. In the universal setting, the goal is to identify all the outliers without any knowledge about the underlying generating distributions. In prior work, this problem was studied as a universal hypothesis testing problem, and a generalized likelihood (GL) test was constructed and its asymptotic performance characterized. In this paper, we propose a different class of tests for this problem based on distribution clustering. Such tests are shown to be exponentially consistent and the time complexity is linear in the total number of sequences, in contrast with the GL test, which has time complexity that is exponential in the number of outliers. Furthermore, our tests based on clustering are applicable to more general scenarios. For example, when both the typical and outlier distributions form clusters, the clustering based test is exponentially consistent, but the GL test is not even applicable.
View on arXiv