计算机科学
变压器
人工智能
深度学习
块(置换群论)
信息融合
模式识别(心理学)
空间分析
特征学习
数据挖掘
遥感
电压
工程类
几何学
数学
地质学
电气工程
作者
Jiayin Song,Yi‐Ming Fan,Wenlong Song,Hongwei Zhou,Liusong Yang,Qiqi Huang,Zhuoyuan Jiang,Chuangqi Wang,Ting Liao
标识
DOI:10.1080/01431161.2023.2285739
摘要
Remote sensing image scene classification is a fundamental task in intelligent interpretation of remote sensing images. Although Transformers possess a powerful attention mechanism, they require lengthy training procedures to achieve good performance levels. To address this issue, this paper proposes a novel deep learning network model by combining CNN and Swin Transformer named SwinHCST. Firstly, the model uses Weighted Normalized CNN to quickly extract low-level features of the image. Secondly, the Receptive Field Block module facilitates multi-scale information fusion, Thirdly, the Information Fusion Transformer further excavates the deep-level features of the image. Furthermore, this paper has designed a plug-and-play Cross Spatial Information Fusion Block, which is used to encodes dimensional information and extracts global information to enhance information exchange. The scene classification experiments show that the proposed model outperforms other methods on the three selected datasets and can achieve excellent performance without requiring large amounts of data and training. Specifically, the classification accuracy of the proposed method on the three datasets is 93.76%, 93.60%, and 98.10%, which is 1.7% to 3.71% higher than ResNet50 and 3.7% to 5.7% higher than Swin Transformer.
科研通智能强力驱动
Strongly Powered by AbleSci AI