计算机科学
变压器
卷积神经网络
人工智能
遥感
工程类
电气工程
电压
地质学
作者
Chao Xie,Shengyu Zhao,Shutong Ye,Yeqi Fei,Xinyan Dai,Yap‐Peng Tan
标识
DOI:10.1109/jstars.2025.3589424
摘要
In the field of remote sensing scene classification, persistent challenges such as interclass similarity and intraclass diversity—stemming from the inherent complexity of remote sensing scenes—continue to impede progress. Although convolutional neural networks (CNNs) and vision transformers (ViTs) have both demonstrated commendable performance in this domain, CNNs often struggle to capture global dependencies, while ViTs show deficiencies in extracting localized image features. To overcome these limitations, we designed a transformer network called ACTFormer, which integrates convolution, self-attention, and attention mechanisms. This effectively combines the local feature extraction capability of convolution with the global dependency modeling ability of self-attention. In addition, in ACTFormer, we designed an adaptive focus attention module, which enables the network to focus more precisely and effectively on significant regions while filtering out irrelevant background noise. We also introduce a hybrid loss function, which combines center loss with cross-entropy loss to further reduce intraclass variance and enhance interclass distinctions. Extensive experiments on three benchmark remote sensing datasets (i.e., AID, NWPU, and UCM) demonstrate the effectiveness of our proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI