计算机科学
卷积神经网络
人工智能
上下文图像分类
变压器
深度学习
可扩展性
计算机视觉
遥感
图像(数学)
工程类
数据库
地理
电压
电气工程
作者
Hao Yuan,Kun Liu,Jiechuan Shi,Can Wang,Weiwei Wang
标识
DOI:10.1145/3627377.3627448
摘要
In recent years, the development of deep learning technology has led to widespread attention on Vision Transformer (ViT) as an emerging image classification method. Remote sensing image classification is an important task in the field of remote sensing, with extensive application prospects. This paper aims to explore the remote sensing image classification method based on Vision Transformer, addressing the limitations of traditional convolutional neural networks in terms of global perception capability, context information retrieval, and positional encoding. The classification performance of the Vision Transformer model is evaluated and compared on remote sensing datasets. Vision Transformer is a deep neural network model based on self-attention mechanism that can capture the global context information in images and has achieved remarkable performance in various computer vision tasks. Furthermore, experimental results demonstrate that the remote sensing image classification method based on Vision Transformer exhibits outstanding accuracy and generalization ability. Compared to traditional convolutional neural networks, it can better capture the global features in remote sensing images and has better scalability when dealing with large-scale remote sensing image data. Experimental results on different remote sensing image datasets show that the model performs well compared to state-of-the-art methods. Specifically, Vision Transformer achieves average classification accuracies of 95.41%, 98.26%, 93.74% and 95.25% on the AID, UC-Merced, NWPU-RESISC45 and Optimal31 datasets, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI