计算机科学
粒度
嵌入
人工智能
卷积神经网络
模式识别(心理学)
变压器
安全性令牌
推论
特征提取
上下文图像分类
图像(数学)
计算机安全
电压
操作系统
物理
量子力学
作者
Bin Li,Er Ouyang,Wenjing Hu,Guoyun Zhang,Lin Zhao,Jianhui Wu
标识
DOI:10.1080/01431161.2022.2142078
摘要
ABSTRACTABSTRACTThe superior local context modelling capability of convolutional neural networks (CNNs) in representing features allows greatly enhanced performance in hyperspectral image (HSI) classification tasks by CNN-based methods. However, most of these methods suffer from a restricted receptive field and poor performance in the continuous data domain. To address these issues, we propose a multi-granularity vision transformer via semantic token (MSTViT) for HSI classification, which differs from the existing transformer view by modelling the HSI classification tasks as word embedding problems. Specifically, the MSTViT model extracts multi-level semantic features by a ladder feature extractor and applies a multi-granularity patch embedding module to embed these features simultaneously as different-scale tokens. Moreover, different-granularity tokens are fed to the vision transformer to capture the long-distance dependencies among the different tokens. A depth-wise separable convolution multi-layer perceptron is used to assist the attention mechanism for further excavation of the deep information of HSI. Finally, the performance of HSI classification is improved by fusing the coarse- and fine-granularity representations to generate stronger features. Experimental results on four standard datasets verify the marked improvement of the MSTViT over state-of-the-art CNN and transformer structures. The code of this work is available at https://github.com/zhaolin6/MSTViT for the sake of reproducibility.KEYWORDS: Hyperspectral image classificationconvolutional neural networkstransformerword embeddinglong-distance dependence AcknowledgmentWe would like to take this opportunity to thank the editor and the anonymous reviewers for their outstanding comments and suggestions, which greatly helped us to improve the technical quality and presentation of the article. We would also like to thank Dr. John Olaghere of Hunan Institute of Science and Technology and Prof. Xin-Hua Hu of East Carolina University for their help in reviewing this article.Disclosure statementNo potential conflict of interest was reported by the authors.Data availability statementData available at https://github.com/zhaolin6/MSTViT.Additional informationFundingThis work was supported in part by the Natural Science Foundation of Hunan Province of China under Grant 2020JJ4343; in part by the Scientific Research Project of the Hunan Provincial Education Department under Grant 19A201, Grant 19A200, Grant 20A214, and Grant 20A223, in part by the Graduate Research and Innovation Project of Hunan Province under CX20211186; and in part by the Graduate Research and Innovation Project of Hunan Institute of Science and Technology under YCX2021A09.
科研通智能强力驱动
Strongly Powered by AbleSci AI