计算机科学
人工智能
特征学习
分割
卷积神经网络
模式识别(心理学)
嵌入
过度拟合
变压器
图像分割
医学影像学
深度学习
无监督学习
机器学习
人工神经网络
物理
量子力学
电压
作者
Thanaporn Viriyasaranon,Sang Myung Woo,Jang‐Hwan Choi
标识
DOI:10.1109/jbhi.2023.3237596
摘要
Recently, transformer-based architectures have been shown to outperform classic convolutional architectures and have rapidly been established as state-of-the-art models for many medical vision tasks. Their superior performance can be explained by their ability to capture long-range dependencies of their multi-head self-attention mechanism. However, they tend to overfit on small- or even medium-sized datasets because of their weak inductive bias. As a result, they require massive, labeled datasets, which are expensive to obtain, especially in the medical domain. This motivated us to explore unsupervised semantic feature learning without any form of annotation. In this work, we aimed to learn semantic features in a self-supervised manner by training transformer-based models to segment the numerical signals of geometric shapes inserted on original computed tomography (CT) images. Moreover, we developed a Convolutional Pyramid vision Transformer (CPT) that leverages multi-kernel convolutional patch embedding and local spatial reduction in each of its layer to generate multi-scale features, capture local information, and reduce computational cost. Using these approaches, we were able to noticeably outperformed state-of-the-art deep learning-based segmentation or classification models of liver cancer CT datasets of 5,237 patients, the pancreatic cancer CT datasets of 6,063 patients, and breast cancer MRI dataset of 127 patients.
科研通智能强力驱动
Strongly Powered by AbleSci AI