全色胶片
计算机科学
人工智能
多光谱图像
图像分辨率
变压器
编码器
卷积神经网络
计算机视觉
模式识别(心理学)
量子力学
操作系统
物理
电压
作者
Xiangchao Meng,Nan Wang,Feng Shao,Shutao Li
出处
期刊:IEEE Transactions on Geoscience and Remote Sensing
[Institute of Electrical and Electronics Engineers]
日期:2022-01-01
卷期号:60: 1-11
被引量:24
标识
DOI:10.1109/tgrs.2022.3168465
摘要
Pansharpening is a fundamental and hot-spot research topic in remote sensing image fusion. In recent years, self-attention-based transformer has attracted considerable attention in natural language processing (NLP) and introduced to attend to computer vision (CV) tasks. Inspired by great success of the vision transformer (ViT) in image classification, we propose an improved and advanced purely transformer-based model for pansharpening. In the proposed method, stacked multispectral (MS) and panchromatic (PAN) images are cropped into patches (i.e., tokens), and after a three-layer self-attention-based encoder, these tokens contain rich information. After upsampled and stitched, a high spatial resolution (HR) MS image is finally obtained. Instead of convolutional neural networks (CNNs) pursuing a short-distance dependency, our proposed method aims to build up a long-distance dependency, to make full use of more useful features. The experiments were conducted on an opening benchmark dataset, including IKONOS with four-band MS/PAN images and WorldView-2 MS images featured by eight bands. In addition, the experiments were performed on reduced and full-resolution datasets from both qualitative and quantitative evaluation aspects. The experimental results indicate the competitive performance of the proposed model than other pansharpening methods, including the state-of-the-art pansharpening algorithms based on CNN.
科研通智能强力驱动
Strongly Powered by AbleSci AI