编码器
安全性令牌
变压器
计算机科学
人工智能
卷积神经网络
模式识别(心理学)
计算机视觉
电压
工程类
计算机网络
电气工程
操作系统
作者
Fangyuan Yan,Binbin Yan,Mingtao Pei
标识
DOI:10.1109/icip49359.2023.10222303
摘要
Compared with convolutional neural networks, vision transformer with powerful global modeling abilities has achieved promising results in natural image classification and has been applied in the field of medical image analysis. Vision transformer divides the input image into a token sequence of fixed hidden size and keeps the hidden size constant during training. However, a fixed size is unsuitable for all medical images. To address the above issue, we propose a new dual transformer encoder model which consists of two transformer encoders with different hidden sizes so that the model can be trained with two token sequences with different sizes. In addition, the vision transformer only considers the class token output by the last layer in the encoders when predicting the category, ignoring the information of other layers. We use a Layer-wise Class token Attention (LCA) classification module that leverages class tokens from all layers of encoders to predict categories. Extensive experiments show that our proposed model obtains better performance than other transformer-based methods, which proves the effectiveness of our model.
科研通智能强力驱动
Strongly Powered by AbleSci AI