人工智能
特征(语言学)
计算机科学
模态(人机交互)
计算机视觉
对偶(语法数字)
流量(数学)
模式识别(心理学)
数学
艺术
哲学
语言学
几何学
文学类
作者
Quan Tang,Liming Xu,Yongheng Wang,Bochuan Zheng,Jiancheng Lv,Xianhua Zeng,Weisheng Li
标识
DOI:10.1016/j.media.2024.103413
摘要
Medical report generation, a cross-modal task of generating medical text information, aiming to provide professional descriptions of medical images in clinical language. Despite some methods have made progress, there are still some limitations, including insufficient focus on lesion areas, omission of internal edge features, and difficulty in aligning cross-modal data. To address these issues, we propose Dual-Modality Visual Feature Flow (DMVF) for medical report generation. Firstly, we introduce region-level features based on grid-level features to enhance the method's ability to identify lesions and key areas. Then, we enhance two types of feature flows based on their attributes to prevent the loss of key information, respectively. Finally, we align visual mappings from different visual feature with report textual embeddings through a feature fusion module to perform cross-modal learning. Extensive experiments conducted on four benchmark datasets demonstrate that our approach outperforms the state-of-the-art methods in both natural language generation and clinical efficacy metrics.
科研通智能强力驱动
Strongly Powered by AbleSci AI