From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation

桥接(联网) 凝视 人工智能 计算机视觉 计算机科学 分割 图像分割 视觉科学 自然语言处理 计算机网络
作者
Jingkun Chen,Haoran Duan,Xiao Zhang,Boyan Gao,Vicente Grau,Jungong Han
出处
期刊:IEEE Transactions on Medical Imaging [Institute of Electrical and Electronics Engineers]
卷期号:: 1-1
标识
DOI:10.1109/tmi.2025.3616598
摘要

Medical image segmentation remains challenging due to the high cost of pixel-level annotations for training. In the context of weak supervision, clinician gaze data captures regions of diagnostic interest; however, its sparsity limits its use for segmentation. In contrast, vision-language models (VLMs) provide semantic context through textual descriptions but lack the explanation precision required. Recognizing that neither source alone suffices, we propose a teacher-student framework that integrates both gaze and language supervision, leveraging their complementary strengths. Our key insight is that gaze data indicates "where" clinicians focus during diagnosis, while VLMs explain "why" those regions are significant. To implement this, the teacher model first learns from gaze points enhanced by VLM-generated descriptions of lesion morphology, establishing a foundation for guiding the student model. The teacher then directs the student through three strategies: (1) Multi-scale feature alignment to fuse visual cues with textual semantics; (2) Confidence-weighted consistency constraints to focus on reliable predictions; (3) Adaptive masking to limit error propagation in uncertain areas. Experiments on the Kvasir-SEG, NCI-ISBI, and ISIC datasets show that our method achieves Dice scores of 80.78%, 80.53%, and 84.22%, respectively-improving 3-5% over gaze baselines without increasing the annotation burden. By preserving correlations among predictions, gaze data, and lesion descriptions, our framework also maintains clinical interpretability. This work illustrates how integrating human visual attention with AI-generated semantic context can effectively overcome the limitations of individual weak supervision signals, thereby advancing the development of deployable, annotation-efficient medical AI systems. Code is available at: https://github.com/jingkunchen/FGI.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
今后应助hyw采纳,获得10
2秒前
熊二完成签到,获得积分10
3秒前
土豪的如萱完成签到,获得积分20
3秒前
3秒前
5秒前
英姑应助34396992采纳,获得30
6秒前
7秒前
醉眠发布了新的文献求助10
8秒前
9秒前
阿航完成签到,获得积分10
9秒前
sandwich完成签到 ,获得积分10
9秒前
整点薯条发布了新的文献求助10
10秒前
11秒前
hyw发布了新的文献求助10
12秒前
L912294993发布了新的文献求助10
16秒前
17秒前
19秒前
乐乐应助凶狠的慕儿采纳,获得10
20秒前
在水一方应助整点薯条采纳,获得10
20秒前
王明磊发布了新的文献求助10
20秒前
tengs完成签到,获得积分10
21秒前
科研通AI2S应助苹果采纳,获得10
22秒前
小二郎应助CC采纳,获得10
22秒前
浮游应助KULI采纳,获得10
23秒前
砂浆黏你发布了新的文献求助10
24秒前
王宇座完成签到,获得积分10
24秒前
王明磊完成签到,获得积分10
26秒前
Mike发布了新的文献求助10
26秒前
乔凌云完成签到 ,获得积分10
26秒前
30秒前
琳琳发布了新的文献求助20
30秒前
imprint完成签到 ,获得积分10
31秒前
Joyce完成签到,获得积分10
32秒前
123完成签到,获得积分10
32秒前
34秒前
量子星尘发布了新的文献求助10
35秒前
36秒前
一只咸鱼发布了新的文献求助20
37秒前
CC完成签到 ,获得积分10
39秒前
一投就中完成签到,获得积分10
39秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
《微型计算机》杂志2006年增刊 1600
Symbiosis: A Very Short Introduction 1500
Einführung in die Rechtsphilosophie und Rechtstheorie der Gegenwart 1500
Binary Alloy Phase Diagrams, 2nd Edition 1000
Air Transportation A Global Management Perspective 9th Edition 700
Letters from Rewi Alley to Ida Pruitt, 1954-1964, vol. 1 600
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 4967954
求助须知:如何正确求助?哪些是违规求助? 4225501
关于积分的说明 13159490
捐赠科研通 4012345
什么是DOI,文献DOI怎么找? 2195526
邀请新用户注册赠送积分活动 1208922
关于科研通互助平台的介绍 1122944