高光谱成像
人工智能
图像(数学)
计算机科学
模式识别(心理学)
全光谱成像
上下文图像分类
计算机视觉
遥感
地质学
作者
Yu‐Han Chen,Qingyun Yan
标识
DOI:10.1109/lgrs.2024.3360184
摘要
Masked Image Modeling (MIM) has made significant advancements across various fields in recent years. Previous research in the hyperspectral (HS) domain often utilizes conventional Transformers to model spectral sequences, overlooking the impact of local details on HS image classification. Furthermore, training models using raw image features as reconstruction targets entails significant challenges. In this study, we specifically focus on the reconstruction targets and feature modeling capabilities of the Vision Transformer (ViT) to address the limitations of MIM methods in the HS domain. As a proposed solution, we introduce a novel and effective method called LFSMIM, which incorporates two key strategies: (1) filtering out high-frequency components from the reconstruction target to mitigate the network's sensitivity to noise, and (2) enhancing the local and global modeling capabilities of the ViT to effectively capture weakened texture details and exploit global spectral features. LFSMIM demonstrated superior performance in overall accuracy compared to other methods on the Indian Pines, Pavia University, and Houston 2013 datasets, achieving accuracies of 95.522%, 98.820%, and 98.160% respectively. The code will be made available at https://github.com/yuweikong/LFSMIM.
科研通智能强力驱动
Strongly Powered by AbleSci AI