ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.11404
20
0

Auto-Encoding for Shared Cross Domain Feature Representation and Image-to-Image Translation

11 June 2020
S. Pal
ArXivPDFHTML
Abstract

Image-to-image translation is a subset of computer vision and pattern recognition problems where our goal is to learn a mapping between input images of domain X1\mathbf{X}_1X1​ and output images of domain X2\mathbf{X}_2X2​. Current methods use neural networks with an encoder-decoder structure to learn a mapping G:X1→X2G:\mathbf{X}_1 \to\mathbf{X}_2G:X1​→X2​ such that the distribution of images from X2\mathbf{X}_2X2​ and G(X1)G(\mathbf{X}_1)G(X1​) are identical, where G(X1)=dG(fG(X1))G(\mathbf{X}_1) = d_G (f_G (\mathbf{X}_1))G(X1​)=dG​(fG​(X1​)) and fG(⋅)f_G (\cdot)fG​(⋅) is referred as the encoder and dG(⋅)d_G(\cdot)dG​(⋅) is referred to as the decoder. Currently, such methods which also compute an inverse mapping F:X2→X1F:\mathbf{X}_2 \to \mathbf{X}_1F:X2​→X1​ use a separate encoder-decoder pair dF(fF(X2))d_F (f_F (\mathbf{X}_2))dF​(fF​(X2​)) or at least a separate decoder dF(⋅)d_F (\cdot)dF​(⋅) to do so. Here we introduce a method to perform cross domain image-to-image translation across multiple domains using a single encoder-decoder architecture. We use an auto-encoder network which given an input image X1\mathbf{X}_1X1​, first computes a latent domain encoding Zd=fd(X1)Z_d = f_d (\mathbf{X}_1)Zd​=fd​(X1​) and a latent content encoding Zc=fc(X1)Z_c = f_c (\mathbf{X}_1)Zc​=fc​(X1​), where the domain encoding ZdZ_dZd​ and content encoding ZcZ_cZc​ are independent. And then a decoder network g(Zd,Zc)g(Z_d,Z_c)g(Zd​,Zc​) creates a reconstruction of the original image X^1=g(Zd,Zc)≈X1\mathbf{\widehat{X}}_1=g(Z_d,Z_c )\approx \mathbf{X}_1X1​=g(Zd​,Zc​)≈X1​. Ideally, the domain encoding ZdZ_dZd​ contains no information regarding the content of the image and the content encoding ZcZ_cZc​ contains no information regarding the domain of the image. We use this property of the encodings to find the mapping across domains G:X→YG: X\to YG:X→Y by simply changing the domain encoding ZdZ_dZd​ of the decoder's input. G(X1)=d(fd(x2i),fc(X1))G(\mathbf{X}_1 )=d(f_d (\mathbf{x}_2^i ),f_c (\mathbf{X}_1))G(X1​)=d(fd​(x2i​),fc​(X1​)) where x2i\mathbf{x}_2^ix2i​ is the ithi^{th}ith observation of X2\mathbf{X}_2X2​.

View on arXiv
Comments on this paper