Do Convolutional Neural Networks act as Compositional Nearest Neighbors?

We present a simple approach based on pixel-wise nearest neighbors to understand and interpret the internal operations of state-of-the-art neural networks for pixel-level tasks. Specifically, we aim to understand the synthesis and prediction mechanisms of state-of-the-art convolutional neural networks for pixel-level tasks. To this end, we primarily analyze the synthesis process of generative models and the prediction mechanism of discriminative models. The main hypothesis of this work is that convolutional neural networks for pixel-level tasks learn a fast compositional nearest neighbor synthesis or prediction function. Our experiments on semantic segmentation and image-to-image translation show qualitative and quantitative evidence supporting this hypothesis.
View on arXiv