Robust monocular 3D face reconstruction under challenging viewing conditions

Hoda Mohaghegh, Farid Boussaid, Hamid Laga, Hossein Rahmani, Mohammed Bennamoun

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Despite extensive research, 3D face reconstruction from a single image remains an open research prob-lem due to the high degree of variability in pose, occlusions and complex lighting conditions. While deep learning-based methods have achieved great success, they are usually limited to near frontal images and images that are free of occlusions. Also, the lack of diverse training data with 3D annotations considerably limits the performance of such methods. As such, existing methods fail to recover, with high fidelity, the facial details especially when dealing with images captured under extreme conditions. To address this issue, we propose an unsupervised coarse-to-fine framework for the reconstruction of 3D faces with detailed textures. Our core idea is that multiple images of the same person but captured under different viewing conditions should provide the same 3D face. We thus propose to leverage a self-augmentation learning technique to train a model that is robust to diverse variations. In addition, instead of directly employing image pixels, we use a set of discriminative features describing the identity and attributes of the face as input to the refinement module, making the model invariant to viewing conditions. This combination of self-augmentation learning with rich face-related features allows the reconstruction of plausible facial details even under challenging viewing conditions. We train the model end-to-end and in a self-supervised manner, without any 3D annotations, landmarks or identity labels, using a combina-tion of an image-level photometric loss and a perception-level loss that is identity and attribute-aware. We evaluate the proposed approach on CelebA and AFLW2000 datasets, and demonstrate its robustness to appearance variations despite learning from unlabeled images. The qualitative comparisons indicate that our method produces detailed 3D faces even under extreme occlusions, out of plane rotations and noise perturbations where existing state-of-the-art methods often fail. We also quantitatively show that our method outperforms SOTA with more than 30.14%, 9.87% and 11.3% in terms of PSNR, SSIM and IDentity similarity, respectively. (c) 2022 Elsevier B.V. All rights reserved.

Original languageEnglish
Pages (from-to)82-93
Number of pages12
JournalNeurocomputing
Volume520
DOIs
Publication statusPublished - 1 Feb 2023

Fingerprint

Dive into the research topics of 'Robust monocular 3D face reconstruction under challenging viewing conditions'. Together they form a unique fingerprint.

Cite this