计算机科学
计算机视觉
人工智能
图像分割
遥感
分割
地质学
作者
Huihui Li,Huajian Pan,Xiaoyong Liu,Jinchang Ren,Zhiguo Du,Jingjing Cao
标识
DOI:10.1109/tgrs.2025.3572127
摘要
Semantic segmentation of remote sensing images has significant advances with the adoption of deep neural networks, taking the advantages of Convolutional Neural Networks (CNNs) in local feature extraction with Transformers in global information modeling. However, due to the limitations of CNNs in long-range modeling capabilities and the computational complexity constraints of Transformers, remote sensing semantic segmentation still faces issues such as serious holes, rough edge segmentation, false and even missed detections caused by the light, shadow and other factors. To address these issues, we propose a visual state space model called GLVMamba, which employs CNNs as the encoder and the proposed Global-Local Visual State Space (GLVSS) block as the core decoder. Specifically, the GLVSS block introduces locality forward feedback and shift window mechanism to addresses the deficiency of insufficient modeling of neighboring pixel dependencies of Mamba, which enhances the integration of global and local context during feature reconstruction, boosts object perception capabilities of the model, and effectively refines edge contours. Additionally, the scale-aware pyramid pooling (SCPP) module is proposed to fully merge the features from various scales and adaptively fuse and extract the distinguishing features to mitigate the holes and false detections. The GLVMamba effectively captures global-local semantic information and multi-scale feature through the GLVSS block and the SCPP module, achieving efficient and accurate remote sensing semantic segmentation. Extensive experiments on two widely used datasets have effectively demonstrated the superiority of our proposed method over the other state-of-the-art methods. The code will be available at https://github.com/Tokisakiwlp/GLVMamba.
科研通智能强力驱动
Strongly Powered by AbleSci AI