Projects per year
Abstract
Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects. We propose an end-to-end R-VOS paradigm that explicitly models temporal instance consistency alongside the referring segmentation. Specifically, we introduce a novel hybrid memory that facilitates inter-frame collaboration for robust spatio-temporal matching and propagation. Features of frames with automatically generated high-quality reference masks are propagated to segment the remaining frames based on multi-granularity association to achieve temporally consistent R-VOS. Furthermore, we propose a new Mask Consistency Score (MCS) metric to evaluate the temporal consistency of video segmentation. Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin, leading to top-ranked performance on popular R-VOS benchmarks, i.e., Ref-YouTube-VOS (67.1%) and Ref-DAVIS17 (65.6%). The code is available at https://github.com/bo-miao/HTR.
Original language | English |
---|---|
Article number | 10572009 |
Pages (from-to) | 11373-11385 |
Number of pages | 13 |
Journal | IEEE Transactions on Circuits and Systems for Video Technology |
Volume | 34 |
Issue number | 11 |
Early online date | 26 Jun 2024 |
DOIs | |
Publication status | Published - 2024 |
Fingerprint
Dive into the research topics of 'Temporally Consistent Referring Video Object Segmentation with Hybrid Memory'. Together they form a unique fingerprint.Projects
- 1 Finished
-
ARC Research Hub for Driving Farming Productivity and Disease Prevention
Bennamoun, M. (Investigator 01) & Mian, A. (Investigator 02)
ARC Australian Research Council
1/01/19 → 31/12/23
Project: Research