Hard Pixels Mining: Learning Using Privileged Information for Semantic Segmentation

It has been shown that incorporating depth features into RGB features helps improve semantic segmentation. However, depth information is usually unavailable for the test images. In this paper, we leverage only the depth of training images as the privileged information to mine the hard pixels in semantic segmentation. Specifically, we propose a novel Loss Weight Module (LWM), which outputs a loss weight map by employing two depth-related measurements of hard pixels: Depth Prediction Error (DPE) and Depth-aware Segmentation Error (DSE). The loss weight map is then applied to segmentation loss, aimed at learning a more robust model by paying more attention to the hard pixels. Besides, we also explore a curriculum learning strategy based on the loss weight map. Meanwhile, to fully mine the hard pixels on different scales, we apply our loss weight module to multi-scale side outputs. Our hard pixels mining method achieves the state-of-the-art results on two benchmark datasets, and even outperforms the methods which need depth input while testing.
View on arXiv