In image labeling, local representations for image units (pixels, patches or superpixels) are usually generated from their surrounding image patches, thus long-range contextual information is not effectively encoded. In this paper, we introduce recurrent neural networks (RNNs) to address this issue. Furthermore, directed acyclic graph RNNs (DAG-RNNs) are proposed to process DAG-structured data, which enables the network to model long-range semantic dependencies among image units. Our DAG-RNNs are capable of tremendously enhancing the discriminative power of local representations, which significantly benefits local classification. We achieve state-of-the-art results on the challenging CamVid, SiftFlow and Barcelona benchmarks.
View on arXiv