可解释性
计算机科学
人工智能
卷积神经网络
模式识别(心理学)
机器学习
变压器
计算机视觉
量子力学
物理
电压
作者
Clément Playout,Renaud Duval,Marie Carole Boucher,Farida Chériet
标识
DOI:10.1016/j.media.2022.102608
摘要
Vision Transformers have recently emerged as a competitive architecture in image classification. The tremendous popularity of this model and its variants comes from its high performance and its ability to produce interpretable predictions. However, both of these characteristics remain to be assessed in depth on retinal images. This study proposes a thorough performance evaluation of several Transformers compared to traditional Convolutional Neural Network (CNN) models for retinal disease classification. Special attention is given to multi-modality imaging (fundus and OCT) and generalization to external data. In addition, we propose a novel mechanism to generate interpretable predictions via attribution maps. Existing attribution methods from Transformer models have the disadvantage of producing low-resolution heatmaps. Our contribution, called Focused Attention, uses iterative conditional patch resampling to tackle this issue. By means of a survey involving four retinal specialists, we validated both the superior interpretability of Vision Transformers compared to the attribution maps produced from CNNs and the relevance of Focused Attention as a lesion detector.
科研通智能强力驱动
Strongly Powered by AbleSci AI