计算机科学
编码(内存)
网(多面体)
人工智能
计算机视觉
模式识别(心理学)
数学
几何学
作者
Yue Hu,Linbo Qing,Zhixuan Zhang,Zhengyong Wang,Li Guo,Yonghong Peng
标识
DOI:10.1016/j.engappai.2024.107909
摘要
Remote sensing scene (RSS) image classification plays a vital role in various fields such as urban planning and environmental protection. However, due to higher inter-class similarity and intra-class variability, achieving accurate classification for RSS images poses a considerable challenge for current convolutional neural networks (CNNs)-based and visual transformer (ViT)-based methods. To address these issues, this paper proposes a novel dual-encoding method named master–slave encoding network (MSE-Net) from two perspectives of feature extraction and fusion. The master encoder, based on ViT, extracts higher-level semantic features, while the slave encoder, based on CNN, captures relative lower-level spatial structure information. Secondly, to integrate feature information from the two encoders effectively, this paper further develop two fusion strategies. The first strategy involves the auxiliary enhancement units (AEUs), which eliminates semantic divergence between the two encoders, enhances spatial context awareness of the slave encoder and promotes effective feature learning. The interactive perception unit (IPU), as the second strategy, facilitates interaction and integration of the two encoders’ representations to extract more discriminative feature information. In addition, we conducted comparative experiments on four widely-used RSS datasets, including RSSCN7, SIRI-WHU, the aerial image dataset (AID) and NWPU-RESISC45 (NWPU45), to verify the effectiveness of MSE-Net. The experimental results demonstrate that MSE-Net achieved state-of-the-art (SOTA) performance across all the datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI