计算机科学
人工智能
最小边界框
卷积神经网络
计算机视觉
人体躯干
变压器
模态(人机交互)
模式识别(心理学)
图像(数学)
量子力学
医学
解剖
物理
电压
作者
Linlin Liu,Hongkai Wang
标识
DOI:10.1145/3524086.3524092
摘要
Localization of multiple organs in PET/CT image is a key step of computer-aided analysis of nuclear medicine images. Human torso organs highly correlate with each other in location and shape. Therefore, utilizing inter-organ geometrical correlation may help improving the organ localization accuracy. In this paper, we construct a Transformer network with one-to-one query architecture for organ bounding box localization in Positron Emission Tomography/Computed Tomography (PET/CT) images. Our method takes advantage of the self-attention mechanism of transformer network to model the inter-organ correlations of positions and sizes. Compared to the state-of-the-arts detection transformer (DETR) network, our one-to-one query architecture has simpler network structure and faster learning convergence. To address the large demand for three-dimensional 3D training images, we propose an effective multi-view localization method based on a 2D pre-trained Transformer network and then back project the multi-view 2D bounding boxes into 3D. Moreover, we propose a dual-modality fusion method to combine the complementary information from the PET and CT images. Experimental results based on 20 testing images demonstrated that our transformer network is more robust than the convolutional neural network (CNN) methods. Our one-to-one query mechanism significantly accelerated the model training speed compared to the DETR model. The fusion of dual modality information also leads to more robust organ localization results than using either single modality alone.
科研通智能强力驱动
Strongly Powered by AbleSci AI