计算机视觉
计算机科学
图像分割
人工智能
分割
医学影像学
图像(数学)
计算机图形学(图像)
作者
Md. Motiur Rahman,Saeka Rahman,Smriti Bhatt,Miad Faezipour
标识
DOI:10.1109/jbhi.2025.3569491
摘要
Precise medical image segmentation is important for automating diagnosis and treatment planning in healthcare. While images present the most significant information for segmenting organs using deep learning models, text reports also provide complementary details that can be leveraged to improve segmentation precision. Performance improvement depends on the proper utilization of text reports and the corresponding images. Most attention modules focus on single-modality computation of spatial, channel, or pixel-level attention. They are ineffective in cross-modal alignment, raising issues in multi-modal scenarios. This study addresses these gaps by presenting a text-assisted vision (TAV) model for medical image segmentation with a novel attention computation module named triguided attention module (TGAM). TGAM computes visual-visual, language-language, and language-visual attention, enabling the model to understand the important features and correlation between images and medical notes. This module helps the model identify the relevant features within images, text annotations, and text annotations to visual interactions. We incorporate an attention gate (AG) that modulates the influence of TGAM, ensuring it does not overflow the encoded features with irrelevant or redundant information, while maintaining their uniqueness. We evaluated the performance of TAV on two popular datasets containing images and corresponding text annotations. We find TAV to be a new state-of-the-art model, as it improves the performance by 2-7% compared to other models. Extensive experiments were performed to demonstrate the effectiveness of each component of the proposed model. The code and datasets are available on Github1.
科研通智能强力驱动
Strongly Powered by AbleSci AI