This paper proposes the novel Pose Guided Person Generation Network (PG) that allows to synthesize person images in arbitrary poses, based on an image of that person and a novel pose. Our generation framework PG utilizes the pose information explicitly and consists of two key stages: pose integration and image refinement. In the first stage the condition image and the target pose are fed into a U-Net-like network to generate an initial but coarse image of the person with the target pose. The second stage then refines the initial and blurry result by training a U-Net-like generator in an adversarial way. Extensive experimental results on both 12864 re-identification images and 256256 fashion photos show that our model generates high-quality person images with convincing details.
View on arXiv