30
0

Auto-Encoding for Shared Cross Domain Feature Representation and Image-to-Image Translation

Abstract

Image-to-image translation is a subset of computer vision and pattern recognition problems where our goal is to learn a mapping between input images of domain X1\mathbf{X}_1 and output images of domain X2\mathbf{X}_2. Current methods use neural networks with an encoder-decoder structure to learn a mapping G:X1X2G:\mathbf{X}_1 \to\mathbf{X}_2 such that the distribution of images from X2\mathbf{X}_2 and G(X1)G(\mathbf{X}_1) are identical, where G(X1)=dG(fG(X1))G(\mathbf{X}_1) = d_G (f_G (\mathbf{X}_1)) and fG()f_G (\cdot) is referred as the encoder and dG()d_G(\cdot) is referred to as the decoder. Currently, such methods which also compute an inverse mapping F:X2X1F:\mathbf{X}_2 \to \mathbf{X}_1 use a separate encoder-decoder pair dF(fF(X2))d_F (f_F (\mathbf{X}_2)) or at least a separate decoder dF()d_F (\cdot) to do so. Here we introduce a method to perform cross domain image-to-image translation across multiple domains using a single encoder-decoder architecture. We use an auto-encoder network which given an input image X1\mathbf{X}_1, first computes a latent domain encoding Zd=fd(X1)Z_d = f_d (\mathbf{X}_1) and a latent content encoding Zc=fc(X1)Z_c = f_c (\mathbf{X}_1), where the domain encoding ZdZ_d and content encoding ZcZ_c are independent. And then a decoder network g(Zd,Zc)g(Z_d,Z_c) creates a reconstruction of the original image X^1=g(Zd,Zc)X1\mathbf{\widehat{X}}_1=g(Z_d,Z_c )\approx \mathbf{X}_1. Ideally, the domain encoding ZdZ_d contains no information regarding the content of the image and the content encoding ZcZ_c contains no information regarding the domain of the image. We use this property of the encodings to find the mapping across domains G:XYG: X\to Y by simply changing the domain encoding ZdZ_d of the decoder's input. G(X1)=d(fd(x2i),fc(X1))G(\mathbf{X}_1 )=d(f_d (\mathbf{x}_2^i ),f_c (\mathbf{X}_1)) where x2i\mathbf{x}_2^i is the ithi^{th} observation of X2\mathbf{X}_2.

View on arXiv
Comments on this paper