Asymmetric Mamba–CNN Collaborative Architecture for Large-Size Remote Sensing Image Semantic Segmentation

计算机科学图像分割人工智能建筑分割计算机视觉遥感图像（数学）模式识别（心理学）地质学地理考古

作者

Jinbo Zhang,Min Chen,Yitao Zhao,Lianlei Shan,Changzhi Li,Han Hu,Xuming Ge,Qing Zhu,Bo Xu

出处

期刊：IEEE Transactions on Geoscience and Remote Sensing [Institute of Electrical and Electronics Engineers]
日期：2025-01-01 卷期号：63: 1-19 被引量：2

标识

DOI：10.1109/tgrs.2025.3589552

摘要

Large-size remote sensing images contain rich geographical information. Efficient and accurate semantic segmentation of these images is of significant importance in various fields. However, the massive memory requirements have hindered the development of semantic segmentation methods for large-size remote sensing images. Most existing methods struggle to balance memory usage, global modeling, and local representation accuracy. To address these issues, we propose a new semantic segmentation method for large-size remote sensing images, Mamba–CNN parallel network (MCPNet), which demonstrates impressive performance. The method is an asymmetric Mamba–convolutional neural network (CNN) hybrid architecture. Given the linear modeling complexity of Mamba, we construct the M-branch based on the visual state space (VSS) model, which processes downsampled images to reduce memory consumption while alleviating Mamba’s local forgetting problem. To further enhance the model’s capability in fine-grained detail extraction, we meticulously design a detail-preserving network (DPN) as the C-branch. This branch employs a split downsampling strategy and multiscale convolutional kernel groups to process large-size images, ensuring the preservation of spatial positional relationships while capturing fine-grained local details. Moreover, to effectively filter redundant information introduced by large-size images and bridge the semantic gap between the features extracted by CNN and Mamba, we propose a multigated feature fusion module (MG-FFM). This module progressively refines heterogeneous feature alignment through a bottom-up hierarchical refinement strategy, achieving a progressive fusion of semantics and details. Our method achieves state-of-the-art (SOTA) performance in terms of mean intersection over union (mIoU) and mF1 score on the self-constructed Yaan UAV dataset and two widely used public datasets (DeepGlobe and Inria Aerial) while consuming less GPU memory. The codes will be available at https://github.com/fsqy-zhang/MCPNet

求助该文献

最长约 10秒，即可获得该文献文件

Asymmetric Mamba–CNN Collaborative Architecture for Large-Size Remote Sensing Image Semantic Segmentation

今日热心研友