11
1

Learning a Weight Map for Weakly-Supervised Localization

Abstract

In the weakly supervised localization setting, supervision is given as an image-level label. We propose to employ an image classifier ff and to train a generative network gg that outputs, given the input image, a per-pixel weight map that indicates the location of the object within the image. Network gg is trained by minimizing the discrepancy between the output of the classifier ff on the original image and its output given the same image weighted by the output of gg. The scheme requires a regularization term that ensures that gg does not provide a uniform weight, and an early stopping criterion in order to prevent gg from over-segmenting the image. Our results indicate that the method outperforms existing localization methods by a sizable margin on the challenging fine-grained classification datasets, as well as a generic image recognition dataset. Additionally, the obtained weight map is also state-of-the-art in weakly supervised segmentation in fine-grained categorization datasets.

View on arXiv
Comments on this paper