With the powerfulness of convolution neural networks (CNN), CNN based detailed face reconstruction has recently shown promising performance in reconstructing detailed face shape from a single image. The key to the success of CNN-based methods lies in a large number of labelled data. However, there is no such dataset available that can provide large-scale face images with their corresponding detailed 3D face geometry. The state-of-the-art learning based face reconstruction method synthesizes the training data by using a coarse morphable face model, where their synthesized face images are not photo-realistic. In this paper, we propose a novel data generation method by rendering a large number of photo-realistic face images with different attributes based on inverse rendering. Furthermore, we also construct a fine-detailed face image dataset by transferring different scales of details from one image to another. With the nicely constructed datasets, we are able to train two cascaded CNNs in a coarse-to-fine manner. Extensive experimental results demonstrate that our method can reconstruct 3D face shapes with geometry details from only one image, and demonstrates the robustness of our method to pose, expression and lighting.
View on arXiv