CVAE: Gaussian Copula-based VAE Differing Disentangled from Coupled Representations with Contrastive Posterior

We present a self-supervised variational autoencoder (VAE) to jointly learn disentangled and dependent hidden factors and then enhance disentangled representation learning by a self-supervised classifier to eliminate coupled representations in a contrastive manner. To this end, a Contrastive Copula VAE (CVAE) is introduced without relying on prior knowledge about data in the probabilistic principle and involving strong modeling assumptions on the posterior in the neural architecture. CVAE simultaneously factorizes the posterior (evidence lower bound, ELBO) with total correlation (TC)-driven decomposition for learning factorized disentangled representations and extracts the dependencies between hidden features by a neural Gaussian copula for copula coupled representations. Then, a self-supervised contrastive classifier differentiates the disentangled representations from the coupled representations, where a contrastive loss regularizes this contrastive classification together with the TC loss for eliminating entangled factors and strengthening disentangled representations. CVAE demonstrates a strong effect in enhancing disentangled representation learning. CVAE further contributes to improved optimization addressing the TC-based VAE instability and the trade-off between reconstruction and representation.
View on arXiv