Uncertainty-Aware Medical Diagnostic Phrase Identification and Grounding

计算机科学短语接地语言模型稳健性（进化）安全性令牌隐藏字幕机器学习可用性人工智能自然语言处理图像（数学）人机交互量子力学化学基因计算机安全生物化学物理

作者

Ke Zou,Yang Bai,Bo Liu,Yidi Chen,Zhihao Chen,Yang Zhou,Xuedong Yuan,Meng Wang,Xiaojing Shen,Xiaochun Cao,Yih Chung Tham,Huazhu Fu

出处

期刊：IEEE Transactions on Pattern Analysis and Machine Intelligence [Institute of Electrical and Electronics Engineers]
日期：2025-08-07 卷期号：47 (12): 11315-11329 被引量：1

链接

nih.govdoi.org

标识

DOI：10.1109/tpami.2025.3596878

摘要

Medical phrase grounding is crucial for identifying relevant regions in medical images based on phrase queries, facilitating accurate image analysis and diagnosis. However, current methods rely on manual extraction of key phrases from medical reports, reducing efficiency and increasing the workload for clinicians. Additionally, the lack of model confidence estimation limits clinical trust and usability. In this paper, we introduce a novel task-Medical Report Grounding (MRG)-which aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner. To address this challenge, we propose uMedGround, a a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases by embedding a unique token, < $\mathtt {BOX}$BOX >, into the vocabulary to enhance detection capabilities. A vision encoder-decoder processes the embedded token and input image to generate grounding boxes. Critically, uMedGround incorporates an uncertainty-aware prediction model, significantly improving the robustness and reliability of grounding predictions. Experimental results demonstrate that uMedGround outperforms state-of-the-art medical phrase grounding methods and fine-tuned large visual-language models, validating its effectiveness and reliability. This study represents a pioneering exploration of the MRG task, marking the first-ever endeavor in this domain. Additionally, we demonstrate the applicability of uMedGround in medical visual question answering and class-based localization tasks, where it highlights visual evidence aligned with key diagnostic phrases, supporting clinicians in interpreting various types of textual inputs, including free-text reports, visual question answering queries, and class labels.

求助该文献

最长约 10秒，即可获得该文献文件

Uncertainty-Aware Medical Diagnostic Phrase Identification and Grounding

今日热心研友