计算机视觉
计算机科学
人工智能
变压器
工程类
电气工程
电压
作者
Pengyuan Lv,Wenjun Wu,Yanfei Zhong,Liangpei Zhang
标识
DOI:10.1109/igarss46834.2022.9883054
摘要
As an important semantic understanding method of remote sensing images, scene classification has received much attention in recent years. Convolutional neural network (CNN) is the representative deep learning method for scene classification which has powerful ability in feature extraction. However, the multilevel features in CNN are acquired by hierarchical convolutional layers which have difficulty in considering the interaction of different objects in the scene. Vision transformer (ViT) model provides a new way to understand the image by directly modeling the contextual information of local patches. This paper makes a review of recent progress of ViT models in the field of computer vision and remote sensing. The major contributions are as follows: 1) A brief review of the traditional scene classification methods is made; 2) ViT based models for scene classification are introduced and compared with CNN models; 3) Experiments of recent ViT models are performed and analyzed on UCM and NWPU datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI