Local-to-global Cost Aggregation for Semantic Correspondence

Zi Wang, Zhiheng Fu, Yulan Guo, Zhang Li, Qifeng Yu

Research output: Contribution to journalArticlepeer-review


Establishing visual correspondences across semantically similar images is challenging due to intra-class variations, viewpoint changes, repetitive patterns, and background clutter. Recent approaches focus on cost aggregation to achieve promising performance. However, these methods fail to jointly utilize local and global cues to suppress unreliable matches. In this paper, we propose a cost aggregation network with convolutions and transformers, dubbed CACT. Different from existing methods, CACT refines the correlation map in a local-to-global manner by utilizing the strengths of convolutions and transformers in different stages. Additionally, considering the bidirectional nature of the correlation map, we propose a dual-path learning framework to work parallelly. Benefiting from the proposed framework, we can use 2D blocks to construct a cost aggregator to improve the efficiency of our model. Experimental results on the SPair-71k, PF-PASCAL, and PF-WILLOW datasets show that the proposed method outperforms the most state-of-the-art methods.

Original languageEnglish
Pages (from-to)1209-1222
Number of pages14
JournalIEEE Transactions on Circuits and Systems for Video Technology
Issue number3
Early online date2022
Publication statusPublished - 1 Mar 2023


Dive into the research topics of 'Local-to-global Cost Aggregation for Semantic Correspondence'. Together they form a unique fingerprint.

Cite this