基于多尺度特征交互和边界优化的遥感图像语义分割

张晗; 吴希文; 刘勋

doi:10.5768/JAO202546.0302001

基于多尺度特征交互和边界优化的遥感图像语义分割

Semantic segmentation of remote sensing images based on multi-scale feature interaction and boundary optimization

摘要

摘要: 对遥感图像进行语义分割是一项具有重要理论意义和实用价值的任务。遥感图像中含有丰富的地物信息，边界处的像素信息也较难判定，分割难度较大。基于Mask2Former架构，提出了改进的多尺度特征交互架构Mask2Former-MS和边界优化架构Mask2Former-BR。前者利用双线性插值法进行上采样和下采样，来达到特征融合的效果，并引入通道注意力机制减少冗余信息的影响；后者通过空洞卷积金字塔池化模块(atrous spatial pyramid pooling, ASPP)进行特征提取，ASPP的每个并行空洞卷积分支使用不同的空洞率捕捉不同尺度的特征信息，利用ReLU层和BN(batch normalization)层进行激活和归一化处理抑制梯度消失，使边界处的像素更加准确。实验结果显示，在GID(Gaofen image dataset)数据集上，通过对比U-Net网络和Mask2Former架构，所改进的Mask2Former-MS架构和Mask2Former-BR架构的最优精确度和最优准确度分别为88.82%、85.90%和89.56%、87.46%，改进架构的分割效果更优。

Abstract: Semantic segmentation of remote sensing images is a task of great theoretical importance and practical values. Remote sensing images contain rich feature information, and the pixel information at the boundary is also more difficult to determine, making segmentation more difficult. Based on the structure of Mask2Former, an improved multi-scale feature interaction structure Mask2Former-MS and a boundary optimization structure Mask2Former-BR were proposed, the former utilized bilinear interpolation for up-sampling and down-sampling to achieve the effect of feature fusion, and a channel attention mechanism was introduced to reduce the effect of redundant information. The latter utilized the atrous spatial pyramid pooling (ASPP) module for feature extraction, the different atrous rates was used to capture feature information at different scales by each parallel atrous convolution branch of ASPP, and the ReLU layer and batch normalization (BN) layer for activation and normalization were used to suppress gradient vanishing and made the pixels at the boundary more accurate. The experimental results show that by comparing the U-Net network and the Mask2Former structure on the Gaofen image dataset (GID), the optimal precision and optimal accuracy of the improved Mask2Former-MS structure and the Mask2Former-BR structure are 88.82%, 85.90% and 89.56%, 87.46%, respectively, and the segmentation effect of the improved structures is better.

HTML全文

参考文献(27)

施引文献

资源附件(0)