计算机科学
图像分割
人工智能
接头(建筑物)
计算机视觉
分割
医学影像学
尺度空间分割
图像(数学)
模式识别(心理学)
工程类
建筑工程
作者
Xu Zhang,Huangxuan Zhao,Lefei Zhang,Yuan Xiong
标识
DOI:10.1109/jbhi.2025.3607023
摘要
The Segment Anything Model (SAM) has attracted considerable attention due to its impressive performance and demonstrates potential in medical image segmentation. Compared to SAM's native point and bounding box prompts, text prompts offer a simpler and more efficient alternative in the medical field, yet this approach remains relatively underexplored. In this paper, we propose a SAM-based framework that integrates a pre-trained vision-language model to generate referring prompts, with SAM handling the segmentation task. The outputs from multimodal models such as CLIP serve as input to SAM's prompt encoder. A critical challenge stems from the inherent complexity of medical text descriptions: they typically encompass anatomical characteristics, imaging modalities, and diagnostic priorities, resulting in information redundancy and semantic ambiguity. To address this, we propose a text decomposition-recomposition strategy. First, clinical narratives are parsed into atomic semantic units (appearance, location, pathology, and so on). These elements are then recombined into optimized text expressions. We employ a cross-attention module among multiple texts to interact with the joint features, ensuring that the model focuses on features corresponding to effective descriptions. To validate the effectiveness of our method, we conducted experiments on several datasets. Compared to the native SAM based on geometric prompts, our model shows improved performance and usability.
科研通智能强力驱动
Strongly Powered by AbleSci AI