Semantic scene completion with dense CRF from a single depth image

Liang Zhang, Le Wang, Xiangdong Zhang, Peiyi Shen, Mohammed Bennamoun, Guangming Zhu, Syed Afaq Ali Shah, Juan Song

Research output: Contribution to journalArticle

Abstract

Scene understanding is a significant research topic in computer vision, especially for robots to understand their environment intelligently. Semantic scene segmentation can help robots to identify the objects that are present in their surroundings, while semantic scene completion can enhance the ability of the robot to infer the object shape, which is pivotal for several high-level tasks. With dense Conditional Random Field (CRF), one key issue is how to construct the long-range interactions between nodes with Gaussian pairwise potentials. Another issue is what effective and efficient inference algorithms can be adapted to resolve the optimization. In this paper, we focus on semantic scene segmentation and completion optimization technology simultaneously using dense CRF based on a single depth image only. Firstly, we convert the single depth image into different down-sampled Truncated Signed Distance Function (TSDF) or flipped TSDF voxel formats, and formulate the pairwise potentials terms with such a representation. Secondly, we use the output results of an end-to-end 3D convolutional neural network named SSCNet to obtain the unary potentials. Finally, we pursue the efficiency of different CRF inference algorithms (the mean-field inference, the negative semi-definite specific difference of convex relaxation, the proximal minimization of linear programming and its variants, etc.). The proposed dense CRF and inference algorithms are evaluated on three different datasets (SUNCG, NYU, and NYUCAD). Experimental results demonstrate that the voxel-level intersection over union (IoU) of predicted voxel's semantic and completion can reach to state-of-the-art. Specifically, for voxel semantic segmentation, the highest IoU improvements are 2.6%, 1.3%, 3.1%, and for scene completion, the highest IoU improvements are 2.5%, 3.7%, 5.4%, respectively for SUNCG, NYU, and NYUCAD datasets.

Original languageEnglish
Pages (from-to)182-195
Number of pages14
JournalNeurocomputing
Volume318
DOIs
Publication statusPublished - 27 Nov 2018

Fingerprint

Semantics
Robots
Linear Programming
Linear programming
Computer vision
Technology
Neural networks
Research
Datasets

Cite this

Zhang, Liang ; Wang, Le ; Zhang, Xiangdong ; Shen, Peiyi ; Bennamoun, Mohammed ; Zhu, Guangming ; Shah, Syed Afaq Ali ; Song, Juan. / Semantic scene completion with dense CRF from a single depth image. In: Neurocomputing. 2018 ; Vol. 318. pp. 182-195.
@article{7f5234c492104b1d8ed8d0853b2c8fd9,
title = "Semantic scene completion with dense CRF from a single depth image",
abstract = "Scene understanding is a significant research topic in computer vision, especially for robots to understand their environment intelligently. Semantic scene segmentation can help robots to identify the objects that are present in their surroundings, while semantic scene completion can enhance the ability of the robot to infer the object shape, which is pivotal for several high-level tasks. With dense Conditional Random Field (CRF), one key issue is how to construct the long-range interactions between nodes with Gaussian pairwise potentials. Another issue is what effective and efficient inference algorithms can be adapted to resolve the optimization. In this paper, we focus on semantic scene segmentation and completion optimization technology simultaneously using dense CRF based on a single depth image only. Firstly, we convert the single depth image into different down-sampled Truncated Signed Distance Function (TSDF) or flipped TSDF voxel formats, and formulate the pairwise potentials terms with such a representation. Secondly, we use the output results of an end-to-end 3D convolutional neural network named SSCNet to obtain the unary potentials. Finally, we pursue the efficiency of different CRF inference algorithms (the mean-field inference, the negative semi-definite specific difference of convex relaxation, the proximal minimization of linear programming and its variants, etc.). The proposed dense CRF and inference algorithms are evaluated on three different datasets (SUNCG, NYU, and NYUCAD). Experimental results demonstrate that the voxel-level intersection over union (IoU) of predicted voxel's semantic and completion can reach to state-of-the-art. Specifically, for voxel semantic segmentation, the highest IoU improvements are 2.6{\%}, 1.3{\%}, 3.1{\%}, and for scene completion, the highest IoU improvements are 2.5{\%}, 3.7{\%}, 5.4{\%}, respectively for SUNCG, NYU, and NYUCAD datasets.",
keywords = "Dense conditional random field (CRF), Inference, Semantic scene completion, Single depth image, Truncated signed distance function (TSDF)",
author = "Liang Zhang and Le Wang and Xiangdong Zhang and Peiyi Shen and Mohammed Bennamoun and Guangming Zhu and Shah, {Syed Afaq Ali} and Juan Song",
year = "2018",
month = "11",
day = "27",
doi = "10.1016/j.neucom.2018.08.052",
language = "English",
volume = "318",
pages = "182--195",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Pergamon",

}

Semantic scene completion with dense CRF from a single depth image. / Zhang, Liang; Wang, Le; Zhang, Xiangdong; Shen, Peiyi; Bennamoun, Mohammed; Zhu, Guangming; Shah, Syed Afaq Ali; Song, Juan.

In: Neurocomputing, Vol. 318, 27.11.2018, p. 182-195.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Semantic scene completion with dense CRF from a single depth image

AU - Zhang, Liang

AU - Wang, Le

AU - Zhang, Xiangdong

AU - Shen, Peiyi

AU - Bennamoun, Mohammed

AU - Zhu, Guangming

AU - Shah, Syed Afaq Ali

AU - Song, Juan

PY - 2018/11/27

Y1 - 2018/11/27

N2 - Scene understanding is a significant research topic in computer vision, especially for robots to understand their environment intelligently. Semantic scene segmentation can help robots to identify the objects that are present in their surroundings, while semantic scene completion can enhance the ability of the robot to infer the object shape, which is pivotal for several high-level tasks. With dense Conditional Random Field (CRF), one key issue is how to construct the long-range interactions between nodes with Gaussian pairwise potentials. Another issue is what effective and efficient inference algorithms can be adapted to resolve the optimization. In this paper, we focus on semantic scene segmentation and completion optimization technology simultaneously using dense CRF based on a single depth image only. Firstly, we convert the single depth image into different down-sampled Truncated Signed Distance Function (TSDF) or flipped TSDF voxel formats, and formulate the pairwise potentials terms with such a representation. Secondly, we use the output results of an end-to-end 3D convolutional neural network named SSCNet to obtain the unary potentials. Finally, we pursue the efficiency of different CRF inference algorithms (the mean-field inference, the negative semi-definite specific difference of convex relaxation, the proximal minimization of linear programming and its variants, etc.). The proposed dense CRF and inference algorithms are evaluated on three different datasets (SUNCG, NYU, and NYUCAD). Experimental results demonstrate that the voxel-level intersection over union (IoU) of predicted voxel's semantic and completion can reach to state-of-the-art. Specifically, for voxel semantic segmentation, the highest IoU improvements are 2.6%, 1.3%, 3.1%, and for scene completion, the highest IoU improvements are 2.5%, 3.7%, 5.4%, respectively for SUNCG, NYU, and NYUCAD datasets.

AB - Scene understanding is a significant research topic in computer vision, especially for robots to understand their environment intelligently. Semantic scene segmentation can help robots to identify the objects that are present in their surroundings, while semantic scene completion can enhance the ability of the robot to infer the object shape, which is pivotal for several high-level tasks. With dense Conditional Random Field (CRF), one key issue is how to construct the long-range interactions between nodes with Gaussian pairwise potentials. Another issue is what effective and efficient inference algorithms can be adapted to resolve the optimization. In this paper, we focus on semantic scene segmentation and completion optimization technology simultaneously using dense CRF based on a single depth image only. Firstly, we convert the single depth image into different down-sampled Truncated Signed Distance Function (TSDF) or flipped TSDF voxel formats, and formulate the pairwise potentials terms with such a representation. Secondly, we use the output results of an end-to-end 3D convolutional neural network named SSCNet to obtain the unary potentials. Finally, we pursue the efficiency of different CRF inference algorithms (the mean-field inference, the negative semi-definite specific difference of convex relaxation, the proximal minimization of linear programming and its variants, etc.). The proposed dense CRF and inference algorithms are evaluated on three different datasets (SUNCG, NYU, and NYUCAD). Experimental results demonstrate that the voxel-level intersection over union (IoU) of predicted voxel's semantic and completion can reach to state-of-the-art. Specifically, for voxel semantic segmentation, the highest IoU improvements are 2.6%, 1.3%, 3.1%, and for scene completion, the highest IoU improvements are 2.5%, 3.7%, 5.4%, respectively for SUNCG, NYU, and NYUCAD datasets.

KW - Dense conditional random field (CRF)

KW - Inference

KW - Semantic scene completion

KW - Single depth image

KW - Truncated signed distance function (TSDF)

UR - http://www.scopus.com/inward/record.url?scp=85053104619&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2018.08.052

DO - 10.1016/j.neucom.2018.08.052

M3 - Article

VL - 318

SP - 182

EP - 195

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -