TY - JOUR
T1 - SCTransNet
T2 - Spatial-Channel Cross Transformer Network for Infrared Small Target Detection
AU - Yuan, Shuai
AU - Qin, Hanlin
AU - Yan, Xiang
AU - Akhtar, Naveed
AU - Mian, Ajmal
N1 - Publisher Copyright:
© 1980-2012 IEEE.
PY - 2024/4/1
Y1 - 2024/4/1
N2 - Infrared small target detection (IRSTD) has recently benefitted greatly from U-shaped neural models. However, largely overlooking effective global information modeling, existing techniques struggle when the target has high similarities with the background. We present a {S} patial-channel {C} ross {T} ransformer Net work (SCTransNet) that leverages spatial-channel cross transformer blocks (SCTBs) on top of long-range skip connections (SKs) to address the aforementioned challenge. In the proposed SCTBs, the outputs of all encoders are interacted with cross transformer to generate mixed features, which are redistributed to all decoders to effectively reinforce semantic differences between the target and clutter at full levels. Specifically, SCTB contains the following two key elements: 1) spatial-embedded single-head channel cross-attention (SSCA) for exchanging local spatial features and full-level global channel information to eliminate ambiguity among the encoders and facilitate high-level semantic associations of the images and 2) a complementary feed-forward network (CFN) for enhancing the feature discriminability via a multiscale strategy and cross-spatial-channel information interaction to promote beneficial information transfer. Our SCTransNet effectively encodes the semantic differences between targets and backgrounds to boost its internal representation for detecting small infrared targets accurately. Extensive experiments on three public datasets, NUDT-SIRST, NUAA-SIRST, and IRSTD-1K, demonstrate that the proposed SCTransNet outperforms existing IRSTD methods. Our code will be made public at https://github.com/xdFai/SCTransNet.
AB - Infrared small target detection (IRSTD) has recently benefitted greatly from U-shaped neural models. However, largely overlooking effective global information modeling, existing techniques struggle when the target has high similarities with the background. We present a {S} patial-channel {C} ross {T} ransformer Net work (SCTransNet) that leverages spatial-channel cross transformer blocks (SCTBs) on top of long-range skip connections (SKs) to address the aforementioned challenge. In the proposed SCTBs, the outputs of all encoders are interacted with cross transformer to generate mixed features, which are redistributed to all decoders to effectively reinforce semantic differences between the target and clutter at full levels. Specifically, SCTB contains the following two key elements: 1) spatial-embedded single-head channel cross-attention (SSCA) for exchanging local spatial features and full-level global channel information to eliminate ambiguity among the encoders and facilitate high-level semantic associations of the images and 2) a complementary feed-forward network (CFN) for enhancing the feature discriminability via a multiscale strategy and cross-spatial-channel information interaction to promote beneficial information transfer. Our SCTransNet effectively encodes the semantic differences between targets and backgrounds to boost its internal representation for detecting small infrared targets accurately. Extensive experiments on three public datasets, NUDT-SIRST, NUAA-SIRST, and IRSTD-1K, demonstrate that the proposed SCTransNet outperforms existing IRSTD methods. Our code will be made public at https://github.com/xdFai/SCTransNet.
KW - Convolutional neural network (CNN)
KW - cross-attention
KW - deep learning
KW - infrared small target detection (IRSTD)
KW - transformer
UR - http://www.scopus.com/inward/record.url?scp=85185870999&partnerID=8YFLogxK
U2 - 10.1109/TGRS.2024.3383649
DO - 10.1109/TGRS.2024.3383649
M3 - Article
AN - SCOPUS:85185870999
SN - 0196-2892
VL - 62
SP - 1
EP - 15
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
M1 - 5002615
ER -