计算机科学
人工智能
眼底(子宫)
计算机视觉
变压器
上下文图像分类
模式识别(心理学)
图像(数学)
眼科
电气工程
电压
医学
工程类
作者
Shuang Yu,Kai Ma,Qi Bi,Cheng Bian,Munan Ning,Nanjun He,Yuexiang Li,Hanruo Liu,Yefeng Zheng
标识
DOI:10.1007/978-3-030-87237-3_5
摘要
With the advancement and prevailing success of Transformer models in the natural language processing (NLP) field, an increasing number of research works have explored the applicability of Transformer for various vision tasks and reported superior performance compared with convolutional neural networks (CNNs). However, as the proper training of Transformer generally requires an extremely large quantity of data, it has rarely been explored for the medical imaging tasks. In this paper, we attempt to adopt the Vision Transformer for the retinal disease classification tasks, by pre-training the Transformer model on a large fundus image database and then fine-tuning on downstream retinal disease classification tasks. In addition, to fully exploit the feature representations extracted by individual image patches, we propose a multiple instance learning (MIL) based ‘MIL head’, which can be conveniently attached to the Vision Transformer in a plug-and-play manner and effectively enhances the model performance for the downstream fundus image classification tasks. The proposed MIL-VT framework achieves superior performance over CNN models on two publicly available datasets when being trained and tested under the same setup. The implementation code and pre-trained weights are released for public access (Code link: https://github.com/greentreeys/MIL-VT).
科研通智能强力驱动
Strongly Powered by AbleSci AI