Enhancing Visual Reasoning with LLM-Powered Knowledge Graphs for Visual Question Localized-Answering in Robotic Surgery

答疑 计算机科学 人工智能 视觉推理 计算机视觉
作者
Pengfei Hao,Hongqiu Wang,Guang Yang,Lei Zhu
出处
期刊:IEEE Journal of Biomedical and Health Informatics [Institute of Electrical and Electronics Engineers]
卷期号:: 1-17
标识
DOI:10.1109/jbhi.2025.3538324
摘要

Expert surgeons often have heavy workloads and cannot promptly respond to queries from medical students and junior doctors about surgical procedures. Thus, research on Visual Question Localized-Answering in Surgery (Surgical-VQLA) is essential to assist medical students and junior doctors in understanding surgical scenarios. Surgical-VQLA aims to generate accurate answers and locate relevant areas in the surgical scene, requiring models to identify and understand surgical instruments, operative organs, and procedures. A key issue is the model's ability to accurately distinguish surgical instruments. Current Surgical-VQLA models rely primarily on sparse textual information, limiting their visual reasoning capabilities. To address this issue, we propose a framework called Enhancing Visual Reasoning with LLM-Powered Knowledge Graphs (EnVR-LPKG) for the Surgical-VQLA task. This framework enhances the model's understanding of the surgical scenario by utilizing knowledge graphs of surgical instruments constructed by the Large Language Model (LLM). Specifically, we design a Fine-grained Knowledge Extractor (FKE) to extract the most relevant information from knowledge graphs and perform contrastive learning with the extracted knowledge graphs and local image. Furthermore, we design a Multi-attention-based Surgical Instrument Enhancer (MSIE) module, which employs knowledge graphs to obtain an enhanced representation of the corresponding surgical instrument in the global scene. Through the MSIE module, the model can learn how to fuse visual features with knowledge graph text features, thereby strengthening the understanding of surgical instruments and further improving visual reasoning capabilities. Extensive experimental results on the EndoVis-17-VQLA and EndoVis-18-VQLA datasets demonstrate that our proposed method outperforms other state-of-the-art methods. We will release our code for future research.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
3秒前
zhuqian发布了新的文献求助10
4秒前
8秒前
8秒前
华生发布了新的文献求助10
8秒前
9秒前
111发布了新的文献求助10
10秒前
愉快树叶完成签到,获得积分10
12秒前
罗舒发布了新的文献求助10
12秒前
善学以致用应助GrandeAmore采纳,获得10
12秒前
元狩完成签到 ,获得积分10
12秒前
snowy_owl发布了新的文献求助10
12秒前
13秒前
13秒前
函花花发布了新的文献求助10
13秒前
Allen发布了新的文献求助10
14秒前
14秒前
调皮的吐司完成签到,获得积分10
15秒前
科研通AI5应助wsnssbnhbx1采纳,获得10
15秒前
无花果应助zhuqian采纳,获得10
16秒前
Niki完成签到,获得积分10
18秒前
Wt发布了新的文献求助10
18秒前
科研通AI2S应助Doyne采纳,获得10
19秒前
yjj6809发布了新的文献求助10
19秒前
19秒前
24秒前
24秒前
传统的雨文完成签到,获得积分10
25秒前
阎2333完成签到,获得积分20
26秒前
111发布了新的文献求助30
26秒前
小二郎应助nunu采纳,获得10
27秒前
28秒前
Myx发布了新的文献求助10
29秒前
zz完成签到,获得积分10
29秒前
阎2333发布了新的文献求助10
29秒前
郭郭要努力ya完成签到 ,获得积分10
31秒前
VDC应助完美芹采纳,获得30
31秒前
览明月完成签到 ,获得积分10
31秒前
34秒前
高分求助中
Les Mantodea de Guyane Insecta, Polyneoptera 2500
Mobilization, center-periphery structures and nation-building 600
Technologies supporting mass customization of apparel: A pilot project 450
China—Art—Modernity: A Critical Introduction to Chinese Visual Expression from the Beginning of the Twentieth Century to the Present Day 430
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
China Gadabouts: New Frontiers of Humanitarian Nursing, 1941–51 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3793299
求助须知:如何正确求助?哪些是违规求助? 3338015
关于积分的说明 10288400
捐赠科研通 3054639
什么是DOI,文献DOI怎么找? 1676091
邀请新用户注册赠送积分活动 804095
科研通“疑难数据库(出版商)”最低求助积分说明 761752