61
19

Self-Supervised Representation Learning as Multimodal Variational Inference

Abstract

In this paper, we propose a probabilistic extension of the recent self-supervised learning (SSL) method, SimSiam. The proposed extension makes SimSiam uncertainty-aware by considering SimSiam as a generative model of augmented views and learning it in terms of variational inference. SimSiam trains a model by maximizing the similarity between image representations of different augmented views of the same image. The augmentation process sometimes produces ambiguous images, and their representations potentially have uncertainty. Although the use of uncertainty-aware machine learning becoming common, such as in deep variational inference, SimSiam and other SSL methods are insufficiently uncertainty-aware, leading to limitations in the use of augmented ambiguous images. Our main contributions are twofold: Firstly, we clarify the theoretical relationship between non-contrastive SSL and multimodal variational inference. Secondly, we introduce a novel SSL called variational inference SimSiam (VI-SimSiam), which incorporates uncertainty by involving spherical posterior distributions. The experiment results show that VI-SimSiam outperforms SimSiam in classification tasks in several datasets, such as ImageNette and ImageWoof by successfully estimating the representation uncertainty.

View on arXiv
Comments on this paper