人工智能
计算机科学
卷积神经网络
医学影像学
深度学习
变压器
灰度
模式识别(心理学)
人工神经网络
计算机视觉
上下文图像分类
机器学习
图像(数学)
物理
量子力学
电压
作者
Lulu Gai,Wei Chen,Rui Gao,Yanwei Chen,Xu Qiao
标识
DOI:10.1109/icip46576.2022.9897966
摘要
Convolutional Neural Networks (CNNs) have been the controlling deep learning approach for a decade in automated medical image diagnosis. Recently, vision transformers (ViTs) have appeared as a competitive alternative to CNNs in computer vision, yielding similar levels of performance while possessing several interesting properties that could prove to be beneficial for the explanation of deep neural networks. Since most medical images are grayscale scans of CT, MRI, etc. and in 3-dimensional (3-D) spaces, which are highly different from natural images, we explore whether it is possible to move to transformer-based models or if we should keep working with CNNs in the domain of 3-D medical image classifi-cations. If so, what are the advantages and drawbacks of switching to ViTs for medical image diagnosis? We consider these problems in a series of experiments on three 3-D medical image datasets. Our findings show that, while CNNs perform better when trained from scratch, ViTs gain strong benifit when pre-trained on ImageNet and outperform their CNN counterparts using self-supervised learning and sharpness-aware minimizer optimization method on the large datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI