TY - JOUR
T1 - Atrous convolutional feature network for weakly supervised semantic segmentation
AU - Xu, Lian
AU - Xue, Hao
AU - Bennamoun, Mohammed
AU - Boussaid, Farid
AU - Sohel, Ferdous
PY - 2021/1/15
Y1 - 2021/1/15
N2 - Weakly supervised semantic segmentation has been attracting increasing attention as it can alleviate the need for expensive pixel-level annotations through the use of image-level labels. Relevant methods mainly rely on the implicit object localization ability of convolutional neural networks (CNNs). However, generated object attention maps remain mostly small and incomplete. In this paper, we propose an Atrous Convolutional Feature Network (ACFN) to generate dense object attention maps. This is achieved by enhancing the context representation of image classification CNNs. More specifically, cascaded atrous convolutions are used in the middle layers to retain sufficient spatial details, and pyramidal atrous convolutions are used in the last convolutional layers to provide multi-scale context information for the extraction of object attention maps. Moreover, we propose an attentive fusion strategy to adaptively fuse the multi-scale features. Our method shows improvements over existing methods on both the PASCAL VOC 2012 and MS COCO datasets, achieving state-of-the-art performance.
AB - Weakly supervised semantic segmentation has been attracting increasing attention as it can alleviate the need for expensive pixel-level annotations through the use of image-level labels. Relevant methods mainly rely on the implicit object localization ability of convolutional neural networks (CNNs). However, generated object attention maps remain mostly small and incomplete. In this paper, we propose an Atrous Convolutional Feature Network (ACFN) to generate dense object attention maps. This is achieved by enhancing the context representation of image classification CNNs. More specifically, cascaded atrous convolutions are used in the middle layers to retain sufficient spatial details, and pyramidal atrous convolutions are used in the last convolutional layers to provide multi-scale context information for the extraction of object attention maps. Moreover, we propose an attentive fusion strategy to adaptively fuse the multi-scale features. Our method shows improvements over existing methods on both the PASCAL VOC 2012 and MS COCO datasets, achieving state-of-the-art performance.
KW - Atrous convolution
KW - Attention mechanism
KW - Multi-label image classification
KW - Multi-scale features
KW - Weakly supervised semantic segmentation
UR - http://www.scopus.com/inward/record.url?scp=85092688079&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2020.09.045
DO - 10.1016/j.neucom.2020.09.045
M3 - Article
AN - SCOPUS:85092688079
SN - 0925-2312
VL - 421
SP - 115
EP - 126
JO - Neurocomputing
JF - Neurocomputing
ER -