Reconstructing clothed 3D human models from a single image is rather challenging, since the information about the invisible areas of a human being has to be “guessed” by algorithms. To reduce the difficulty, current state-of-the-art methods usually employ a parametric 3D body model to guide the clothed 3D human reconstruction. However, the quality of reconstructed clothed 3D human models heavily
... [Show full abstract] depends on the accuracy of the parametric body model. To address this problem, we propose to employ a well-aligned parametric body model to guide single-image clothed 3D human reconstruction. First, the STAR model is adopted as the statistical model to represent the parametric body model, and a two-stage method that combines a regression-based approach and an optimization-based approach is proposed to estimate the pose and shape parameters iteratively. By incorporating the advantages of the statistical models and the parameter estimation method, a well-aligned 3D body model can be recovered from a single input image. Then, a deep neural network that fuses the 3D geometry information of the 3D parametric body model and the visual features extracted from the input image is proposed for reconstructing clothed 3D human models. Training losses that aim to align the reconstructed model with the ground-truth model respectively in the 3D model space and the multi-view 2D re-projection spaces are designed. Quantitative and qualitative experimental results on three public datasets (THuman, BUFF, and LSP) show that our method produces more accurate and robust clothed 3D human reconstructions compared to the state-of-the-art methods.