计算机科学
分割
融合
人工智能
计算机视觉
遥感
模式识别(心理学)
地质学
哲学
语言学
作者
Haixia Feng,Qingwu Hu,Pengcheng Zhao,Shunli Wang,Mingyao Ai,Daoyuan Zheng,Tiancheng Liu
标识
DOI:10.1109/tgrs.2025.3553478
摘要
High-resolution remote sensing images contain rich color and texture information, but due to the inherent limitations of 2-D data, achieving high-quality semantic segmentation remains a challenge. Multimodal data fusion technology has emerged as an effective approach to overcome this issue. To accurately capture the semantic information in remote sensing images, this study designs a multimodal fusion Transformer-based DeepLabv3+ model for remote sensing semantic segmentation, named FTransDeepLab. Specifically, the network learns features from two modalities and is inspired by the DeepLab architecture. We extended the encoder by stacking the multiscale Segformer, encoding the input images into highly representative spatial features. Additionally, we introduced the multimodal feature rectification (MFR) module and the multimodal feature fusion (MFF) module. The MFR, composed of a channel attention module and a spatial attention module, enhances the model’s ability to capture essential features and improves performance by focusing on both global and local contexts. The MFF module utilizes a cross-attention mechanism to optimize the feature fusion process, which enhances representation learning by facilitating the interaction between diverse information and integrates features from different modalities. Finally, in the decoding path, the extracted high-level features are concatenated with low-level features to optimize the feature representation and upsampled to restore the size of input image. Extensive results on two datasets, the International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam, have confirmed that the proposed FTransDeepLab can achieve superior performance compared to the state-of-the-art segmentation methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI