The recent improvement in anomaly detection methods has prompted research into anomaly segmentation i.e. finding the pixels of the image that contain anomalies. In this paper, we investigate novel methods for unleashing the full power of pretrained features for anomaly segmentation. We first present a simple baseline that uses a pyramid of deep convolutional features and show that it significantly improves over the state-of-the-art methods, which are much more complex. One issue with the baseline approach is that it is unable to use the global context of the image effectively. We show that global attention-based methods are better able to utilize the global context. Specifically, we present an approach based on a multi-scale transformer architecture and show that it further improves performance. By analysing the attention maps, we find that they often detect anomalous image regions in a zero-shot fashion, providing some insight into the result. A qualitative evaluation of our method shows significant gains.
View on arXiv