Out-of-Distribution Detection with Distance Guarantee in Deep Generative Models

IEEE Transactions on Knowledge and Data Engineering (TKDE), 2020

9 February 2020

Abstract

It is challenging to detect anomaly (or out-of-distribution (OOD) data) in deep generative models (DGM) including flow-based models and variational autoencoders (VAEs). In this paper, we prove that, for a well-trained flow-based model, the distance between the distribution of representations of an OOD dataset and prior can be large enough, as long as the distance between the distributions of the training dataset and the OOD dataset is large enough. Since the most commonly used prior in flow-based model is factorized, the distribution of representations of an OOD dataset tends to be non-factorized when far from the prior. Furthermore, we observe that the distribution of the representations of OOD datasets in flow model is also Gaussian-like. Based on our theorem and the key observation, we propose an easy-to-perform method both for group and point-wise anomaly detection via estimating the total correlation of representations in DGM. We have conducted extensive experiments on prevalent benchmarks to evaluate our method. For group anomaly detection (GAD), our method can achieve near 100% AUROC on all problems and has robustness against data manipulation. On the contrary, the state-of-the-art (SOTA) GAD method performs not better than random guessing for challenging problems and can be attacked by data manipulation in almost all cases. For point-wise anomaly detection (PAD), our method is comparable to SOTA PAD method on one category of problems and achieves near 100% AUROC on another category of problems where the SOTA PAD method fails.

View on arXiv

Comments on this paper