Projects per year
Abstract
This paper proposes a new transformer-based framework to learn class-specific object localization maps as pseudo labels for weakly supervised semantic segmentation (WSSS). Inspired by the fact that the attended regions of the one-class token in the standard vision transformer can be leveraged to form a class-agnostic localization map, we investigate if the transformer model can also effectively capture class-specific attention for more discriminative object localization by learning multiple class tokens within the transformer. To this end, we propose a Multi-class Token Transformer, termed as MCTformer, which uses multiple class tokens to learn interactions between the class tokens and the patch tokens. The proposed MCTformer can successfully produce class-discriminative object localization maps from the class-to-patch attentions corresponding to different class tokens. We also propose to use a patchlevel pairwise affinity, which is extracted from the patchto-patch transformer attention, to further refine the localization maps. Moreover, the proposed framework is shown to fully complement the Class Activation Mapping (CAM) method, leading to remarkably superior WSSS results on the PASCAL VOC and MS COCO datasets. These results underline the importance of the class token for WSSS.(-1)
Original language | English |
---|---|
Title of host publication | 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR 2022) |
Place of Publication | USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 4300-4309 |
Number of pages | 10 |
ISBN (Electronic) | 9781665469463 |
ISBN (Print) | 9781665469463 |
DOIs | |
Publication status | Published - 2022 |
Event | IEEE/CVF Conference on Computer Vision and Pattern Recognition - , Lao People's Democratic Republic Duration: 18 Jun 2022 → 24 Jun 2022 |
Conference
Conference | IEEE/CVF Conference on Computer Vision and Pattern Recognition |
---|---|
Abbreviated title | CVPR |
Country/Territory | Lao People's Democratic Republic |
Period | 18/06/22 → 24/06/22 |
Fingerprint
Dive into the research topics of 'Multi-class Token Transformer for Weakly Supervised Semantic Segmentation'. Together they form a unique fingerprint.-
Intelligent Virtual Human Companions
Bennamoun, M. (Investigator 01), Laga, H. (Investigator 02) & Boussaid, F. (Investigator 03)
ARC Australian Research Council
31/12/21 → 30/12/25
Project: Research
-
Fine-grained Human Action Recognition with Deep Graph Neural Networks
Wang, Z. (Investigator 01), Bennamoun, M. (Investigator 02), Hagenbuchner, M. (Investigator 03), Tsoi, A. C. (Investigator 04) & Lewis, S. (Investigator 05)
ARC Australian Research Council
4/01/21 → 31/12/24
Project: Research