27

XFACTORS: Disentangled Information Bottleneck via Contrastive Supervision

Alexandre Myara
Nicolas Bourriez
Thomas Boyer
Thomas Lemercier
Ihab Bendidi
Auguste Genovesio
Main:8 Pages
21 Figures
Bibliography:3 Pages
8 Tables
Appendix:15 Pages
Abstract

Disentangled representation learning aims to map independent factors of variation to independent representation components. On one hand, purely unsupervised approaches have proven successful on fully disentangled synthetic data, but fail to recover semantic factors from real data without strong inductive biases. On the other hand, supervised approaches are unstable and hard to scale to large attribute sets because they rely on adversarial objectives or auxiliary classifiers.We introduce \textsc{XFactors}, a weakly-supervised VAE framework that disentangles and provides explicit control over a chosen set of factors. Building on the Disentangled Information Bottleneck perspective, we decompose the representation into a residual subspace S\mathcal{S} and factor-specific subspaces T1,,TK\mathcal{T}_1,\ldots,\mathcal{T}_K and a residual subspace S\mathcal{S}. Each target factor is encoded in its assigned Ti\mathcal{T}_i through contrastive supervision: an InfoNCE loss pulls together latents sharing the same factor value and pushes apart mismatched pairs. In parallel, KL regularization imposes a Gaussian structure on both S\mathcal{S} and the aggregated factor subspaces, organizing the geometry without additional supervision for non-targeted factors and avoiding adversarial training and classifiers.Across multiple datasets, with constant hyperparameters, \textsc{XFactors} achieves state-of-the-art disentanglement scores and yields consistent qualitative factor alignment in the corresponding subspaces, enabling controlled factor swapping via latent replacement. We further demonstrate that our method scales correctly with increasing latent capacity and evaluate it on the real-world dataset CelebA. Our code is available at \href{this https URL}{this http URL}.

View on arXiv
Comments on this paper