TY - JOUR
T1 - MBT-UNet
T2 - Multi-Branch Transform Combined with UNet for Semantic Segmentation of Remote Sensing Images
AU - Liu, Bin
AU - Li, Bing
AU - Sreeram, Victor
AU - Li, Shuofeng
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/8
Y1 - 2024/8
N2 - Remote sensing (RS) images play an indispensable role in many key fields such as environmental monitoring, precision agriculture, and urban resource management. Traditional deep convolutional neural networks have the problem of limited receptive fields. To address this problem, this paper introduces a hybrid network model that combines the advantages of CNN and Transformer, called MBT-UNet. First, a multi-branch encoder design based on the pyramid vision transformer (PVT) is proposed to effectively capture multi-scale feature information; second, an efficient feature fusion module (FFM) is proposed to optimize the collaboration and integration of features at different scales; finally, in the decoder stage, a multi-scale upsampling module (MSUM) is proposed to further refine the segmentation results and enhance segmentation accuracy. We conduct experiments on the ISPRS Vaihingen dataset, the Potsdam dataset, the LoveDA dataset, and the UAVid dataset. Experimental results show that MBT-UNet surpasses state-of-the-art algorithms in key performance indicators, confirming its superior performance in high-precision remote sensing image segmentation tasks.
AB - Remote sensing (RS) images play an indispensable role in many key fields such as environmental monitoring, precision agriculture, and urban resource management. Traditional deep convolutional neural networks have the problem of limited receptive fields. To address this problem, this paper introduces a hybrid network model that combines the advantages of CNN and Transformer, called MBT-UNet. First, a multi-branch encoder design based on the pyramid vision transformer (PVT) is proposed to effectively capture multi-scale feature information; second, an efficient feature fusion module (FFM) is proposed to optimize the collaboration and integration of features at different scales; finally, in the decoder stage, a multi-scale upsampling module (MSUM) is proposed to further refine the segmentation results and enhance segmentation accuracy. We conduct experiments on the ISPRS Vaihingen dataset, the Potsdam dataset, the LoveDA dataset, and the UAVid dataset. Experimental results show that MBT-UNet surpasses state-of-the-art algorithms in key performance indicators, confirming its superior performance in high-precision remote sensing image segmentation tasks.
KW - convolutional neural network
KW - remote sensing
KW - semantic segmentation
KW - transformer
UR - https://www.scopus.com/pages/publications/85200876314
U2 - 10.3390/rs16152776
DO - 10.3390/rs16152776
M3 - Article
AN - SCOPUS:85200876314
SN - 2072-4292
VL - 16
JO - Remote Sensing
JF - Remote Sensing
IS - 15
M1 - 2776
ER -