Projects per year
Abstract
Weakly supervised dense object localization (WSDOL) relies generally on Class Activation Mapping (CAM), which exploits the correlation between the class weights of the image classifier and the pixel-level features. Due to the limited ability to address intra-class variations, the image classifier cannot properly associate the pixel features, leading to inaccurate dense localization maps. In this paper, we propose to explicitly construct multi-modal class representations by leveraging the Contrastive Language-Image Pre-training (CLIP), to guide dense localization. More specifically, we propose a unified transformer framework to learn two-modalities of class-specific tokens, i.e., class-specific visual and textual tokens. The former captures semantics from the target visual data while the latter exploits the class-related language priors from CLIP, providing complementary information to better perceive the intra-class diversities. In addition, we propose to enrich the multi-modal class-specific tokens with sample-specific contexts comprising visual context and image-language context. This enables more adaptive class representation learning, which further facilitates dense localization. Extensive experiments show the superiority of the proposed method for WSDOL on two multi-label datasets, i.e., PASCAL VOC and MS COCO, and one single-label dataset, i.e., OpenImages. Our dense localization maps also lead to the state-of-the-art weakly supervised semantic segmentation (WSSS) results on PASCAL VOC and MS COCO.11https://github.com/xulianuwa/MMCST
Original language | English |
---|---|
Title of host publication | 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 19596-19605 |
ISBN (Electronic) | 9798350301298 |
DOIs | |
Publication status | Published - 2023 |
Event | 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition - Vancouver Convention Center, Vancouver, Canada Duration: 18 Jun 2023 → 22 Jun 2023 |
Conference
Conference | 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition |
---|---|
Abbreviated title | CVPR 2023 |
Country/Territory | Canada |
City | Vancouver |
Period | 18/06/23 → 22/06/23 |
Fingerprint
Dive into the research topics of 'Learning Multi-Modal Class-Specific Tokens for Weakly Supervised Dense Object Localization'. Together they form a unique fingerprint.-
Intelligent Virtual Human Companions
Bennamoun, M., Laga, H. & Boussaid, F.
ARC Australian Research Council
31/12/21 → 30/12/25
Project: Research
-
Fine-grained Human Action Recognition with Deep Graph Neural Networks
Wang, Z., Bennamoun, M., Hagenbuchner, M., Tsoi, A. C. & Lewis, S.
ARC Australian Research Council
4/01/21 → 31/12/24
Project: Research
-
ARC Research Hub for Driving Farming Productivity and Disease Prevention
ARC Australian Research Council
1/01/19 → 31/12/23
Project: Research