430

NeuralAnnot: Neural Annotator for in-the-wild Expressive 3D Human Pose and Mesh Training Sets

Abstract

Recovering expressive 3D human pose and mesh from in-the-wild images is greatly challenging due to the absence of the training data. Several optimization-based methods have been used to obtain 3D human model fits from GT 2D poses, which serve as pseudo-groundtruth (GT) 3D poses and meshes. However, they often suffer from severe depth ambiguity while requiring long running time because of their per-sample optimization that only uses 2D supervisions and priors. The per-sample optimization optimizes a 3D human model on each sample independently; therefore, running it on a large number of samples consumes long running time. In addition, the absence of the 3D supervisions makes their framework suffer from depth ambiguity. To overcome the limitations, we present NeuralAnnot, a neural annotator that learns to construct in-the-wild expressive 3D human pose and mesh training sets. Our NeuralAnnot is trained on entire datasets by considering multiple samples together with additional 3D supervisions from auxiliary datasets; therefore, it produces far better 3D pseudo-GT fits much faster. We show that the newly obtained training set brings great performance gain, which will be publicly released with codes.

View on arXiv
Comments on this paper