73
34

Understanding Deep Contrastive Learning via Coordinate-wise Optimization

Abstract

We show that Contrastive Learning (CL) under a broad family of loss functions (including InfoNCE) has a unified formulation of coordinate-wise optimization on the network parameter θ\boldsymbol{\theta} and pairwise importance α\alpha, where the \emph{max player} θ\boldsymbol{\theta} learns representation for contrastiveness, and the \emph{min player} α\alpha puts more weights on pairs of distinct samples that share similar representations. The resulting formulation, called α\alpha-CL, unifies not only various existing contrastive losses, which differ by how sample-pair importance α\alpha is constructed, but also is able to extrapolate to give novel contrastive losses beyond popular ones, opening a new avenue of contrastive loss design. These novel losses yield comparable (or better) performance on CIFAR10, STL-10 and CIFAR-100 than classic InfoNCE. Furthermore, we also analyze the max player in detail: we prove that with fixed α\alpha, max player is equivalent to Principal Component Analysis (PCA) for deep linear network, and almost all local minima are global and rank-1, recovering optimal PCA solutions. Finally, we extend our analysis on max player to 2-layer ReLU networks, showing that its fixed points can have higher ranks.

View on arXiv
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.