Abstract:
Semantic segmentation of remote sensing images is a task of great theoretical importance and practical value. Remote sensing images contain rich feature information, and the pixel information at the boundary is also more difficult to determine, making segmentation more difficult. Based on the structure Mask2Former, an improved multi-scale feature interaction structure Mask2Former-MS and a boundary optimization structure Mask2Former-BR are proposed, the former utilizes bilinear interpolation for up-sampling and down-sampling to achieve the effect of feature fusion and introduces a channel attention mechanism to reduce the effect of redundant information; the latter utilizes the cavity convolution pyramid Pooling module ASPP(atrous spatial pyramid pooling) for feature extraction, each parallel null convolution branch of ASPP uses different null rates to capture feature information at different scales, and uses ReLU layer and BN(batch normalization) layer for activation and normalization to suppress gradient vanishing and make the pixels at the boundary more accurate. The experimental results show that by comparing the U-Net network and the Mask2Former structure on the Gaofen Image Dataset (GID) dataset, the optimal precision and optimal accuracy of the improved Mask2Former-MS structure and the Mask2Former-BR structure are 88.82%, 85.90% and 89.56%, 87.46%, and the segmentation of the improved structures is better.