作者
Hongyan Tang,Wenbo Li,Zhenxing Huang,Yaping Wu,Jianmin Yuan,Yang Yang,Yan Zhang,Yongfeng Yang,Hairong Zheng,Dong Liang,Meiyun Wang,Zhanli Hu
摘要
Abstract Background Multimodal medical imaging methods, such as positron emission tomography/computed tomography (PET/CT), are widely used for diagnosing diseases because they provide both structural and functional information. However, PET/CT has limitations in terms of visualizing soft tissues, particularly for brain diseases, which highlights the need for magnetic resonance imaging (MRI). Purpose Given the limited adoption of PET/ magnetic resonance (MR) devices for making MR images available and the discomfort of elderly cancer patients during long‐term MR scanning, a promising solution is to develop methods for synthesizing MR images from other modalities. While previous research has focused mainly on structure‐to‐structure modality transitions, such as CT‐to‐MR synthesis, our study aims to explore a new function‐to‐structure transition approach to realize PET‐to‐MR synthesis. Specifically, we propose a structural semantic‐guided deep learning network to synthesize MR images from PET data to simplify medical imaging processes, improving both efficiency and accessibility. Methods We propose a structural semantic‐guided deep learning network with a dual cross‐attention (DCA) module to synthesize MR images from PET data for realizing the function‐to‐structure modality transition. The network introduces a structural semantic loss to preserve structural information and details, and the DCA module utilizes cross‐attention to effectively capture the channel and spatial interdependencies among multiscale features. The proposed method was compared with other deep learning‐based methods, including 3DUXNET, UNETR, nnFormer, CycleGAN, Pix2pix, edge‐aware generative adversarial network (Ea‐GAN), and MedNet. Additionally, visual and quantitative analysis was employed to evaluate the model performance. Furthermore, correlation analysis based on pixel averages, semantic assessment, and additional data assessment was performed for the quantitative evaluation of image synthesis results. Additionally, an ablation experiment was conducted to validate the effectiveness of introducing structural semantic loss and the DCA module in enhancing model performance. Results The experiments demonstrate that the proposed method yields superior visual and quantitative outcomes, with a peak signal‐to‐noise ratio (PSNR) of 29.09 dB, a structural similarity index measure (SSIM) of 0.8417, and a mean absolute error (MAE) of 0.0296. Additionally, the correlation analysis based on pixel averages shows a fitted slope of 0.957 in the left caudate region, and the semantic segmentation results reveal a Dice score of 0.8977 in the left thalamus proper. These findings indicate that the synthetic images generated by the proposed method are consistent with the ground truth (GT) and preserve the structural semantic information. Furthermore, an ablation analysis reveals that both the introduction of the structural semantic loss and the incorporation of the DCA module could enhance model performance. Conclusion We propose a synthesis method by introducing structural semantic loss to preserve semantic information and incorporating attention mechanisms into the synthesis network to capture global information. Visual, quantitative, and segmentation semantic results illustrate that the proposed method achieves excellent performance in image synthesis. In future work, we will try to utilize our synthesis method in other modal synthesis tasks and in clinical practice.