TY - GEN
T1 - Self-Supervised Video Object Segmentation by Motion-Aware Mask Propagation
AU - Miao, Bo
AU - Bennamoun, Mohammed
AU - Gao, Yongsheng
AU - Mian, Ajmal
N1 - Funding Information:
This research was funded by the Australian Research Council Industrial Transformation Research Hub IH180100002.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - We propose a self-supervised spatio-temporal matching method, coined Motion-Aware Mask Propagation (MAMP), for video object segmentation. MAMP leverages the frame reconstruction task for training without the need for annotations. During inference, MAMP builds a dynamic memory bank and propagates masks according to our proposed motion-aware spatio-temporal matching module, which is able to handle fast motion and long-term matching scenarios. Evaluation on DAVIS-2017 and YouTube-VOS datasets show that MAMP achieves state-of-the-art performance with stronger generalization ability compared to existing self-supervised methods, i.e., 4.2% higher mean \mathcal{J} & \mathcal{F} on DAVIS-2017 and 4.85% higher mean \mathcal{J} & \mathcal{F} on the unseen categories of YouTube-VOS than the nearest competitor. Moreover, MAMP performs at par with many supervised video object segmentation methods. Our code is available at: https://github.com/bo-miao/MAMP.
AB - We propose a self-supervised spatio-temporal matching method, coined Motion-Aware Mask Propagation (MAMP), for video object segmentation. MAMP leverages the frame reconstruction task for training without the need for annotations. During inference, MAMP builds a dynamic memory bank and propagates masks according to our proposed motion-aware spatio-temporal matching module, which is able to handle fast motion and long-term matching scenarios. Evaluation on DAVIS-2017 and YouTube-VOS datasets show that MAMP achieves state-of-the-art performance with stronger generalization ability compared to existing self-supervised methods, i.e., 4.2% higher mean \mathcal{J} & \mathcal{F} on DAVIS-2017 and 4.85% higher mean \mathcal{J} & \mathcal{F} on the unseen categories of YouTube-VOS than the nearest competitor. Moreover, MAMP performs at par with many supervised video object segmentation methods. Our code is available at: https://github.com/bo-miao/MAMP.
KW - Self-supervised Learning
KW - Video Object Segmentation
UR - http://www.scopus.com/inward/record.url?scp=85133577534&partnerID=8YFLogxK
U2 - 10.1109/ICME52920.2022.9859966
DO - 10.1109/ICME52920.2022.9859966
M3 - Conference paper
AN - SCOPUS:85133577534
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - ICME 2022 - IEEE International Conference on Multimedia and Expo 2022, Proceedings
PB - IEEE, Institute of Electrical and Electronics Engineers
T2 - 2022 IEEE International Conference on Multimedia and Expo, ICME 2022
Y2 - 18 July 2022 through 22 July 2022
ER -