Adversarial Inversion: Inverse Graphics with Adversarial Priors

31 May 2017

H. Tung

Abstract

We propose adversarial inversion, a weakly supervised neural network model that combines inverse rendering with adversarial networks. Given a set of images, our inverse rendering encoder predicts a set of latent factors (e.g., depth, camera pose), which a renderer then projects to reconstruct (part of) the visual input. Inversion is often ambiguous, e.g., many compositions of 3D shape and camera pose give rise to the same 2D projection. To address this ambiguity, we impose priors on the predicted latent factors, through an adversarial discriminator network trained to discriminate between predicted factors and ground-truth ones. Training adversarial inversion does not require input-output paired annotations, but merely a collection of ground-truth factors, unrelated (unpaired) to the current input. Our model can thus be self-supervised by unlabelled image data, by minimizing a joint reconstruction and adversarial loss, complementing any direct supervision provided by paired annotations. We apply adversarial inversion to 3D human pose estimation and 3D structure and egomotion estimation, and outperform models supervised by only paired annotations, and/or reconstruction losses, that do not use adversarial priors. Applying adversarial inversion to super-resolution and inpainting results in automated "visual plastic surgery". In adversarial super-resolution, when the discriminator is provided with young, old, female, male, or Tom Cruise faces as ground-truth, our model renders the input face image towards its young, old, feminine, masculine or Tom Cruise-like equivalent. In adversarial inpainting, when the discriminator is provided with faces with big lips or big noses as ground-truth, it creates visual lip or nose augmentations.

View on arXiv

Comments on this paper