Learning a Weight Map for Weakly-Supervised Localization

In the weakly supervised localization setting, supervision is given as an image-level label. We propose to employ an image classifier and to train a generative network that outputs, given the input image, a per-pixel weight map that indicates the location of the object within the image. Network is trained by minimizing the discrepancy between the output of the classifier on the original image and its output given the same image weighted by the output of . The scheme requires a regularization term that ensures that does not provide a uniform weight, and an early stopping criterion in order to prevent from over-segmenting the image. Our results indicate that the method outperforms existing localization methods by a sizable margin on the challenging fine-grained classification datasets, as well as a generic image recognition dataset. Additionally, the obtained weight map is also state-of-the-art in weakly supervised segmentation in fine-grained categorization datasets.
View on arXiv