Multi-stage information diffusion for joint depth and surface normal estimation

Zhiheng Fu, Siyu Hong, Mengyi Liu, Hamid Laga, Mohammed Bennamoun, Farid Boussaid, Yulan Guo

Research output: Contribution to journalArticlepeer-review


Depth and surface normal estimations are important for 3D geometric perception, which has numerous applications including autonomous vehicles and robots. In this paper, we propose a lightweight Multi-stage Information Diffusion Network (MIDNet) for the simultaneous prediction of depth and surface normals from a single RGB image. To obtain semantic and detail-preserving features, we adopt a high-resolution network as our backbone to learn multi-scale features, which are then fused into shared features for the two tasks. To mutually boost each task, a Cross-Correlation Attention Module (CCAM) is proposed to adaptively integrate information for the prediction of the two tasks in multiple stages, including feature-level information interaction and task-level information interaction. Ablation studies show that the proposed multi-stage information diffusion strategy can boost the performance gain for the two tasks at different levels. Compared to current state-of-the-art methods on the NYU Depth V2, Stanford 2D-3D-Semantic and KITTI datasets, our method achieves superior performance for both monocular depth and surface normal estimation.

Original languageEnglish
Article number109660
Number of pages12
JournalPattern Recognition
Early online date1 May 2023
Publication statusPublished - Sept 2023


Dive into the research topics of 'Multi-stage information diffusion for joint depth and surface normal estimation'. Together they form a unique fingerprint.

Cite this