计算机科学
隐藏字幕
变压器
人工智能
标准化
自然语言处理
语义学(计算机科学)
情报检索
图像(数学)
程序设计语言
量子力学
操作系统
物理
电压
作者
Yiming Cao,Lizhen Cui,Fuqiang Yu,Lei Zhang,Zhen Li,Ning Liu,Yonghui Xu
标识
DOI:10.1007/978-3-031-00129-1_8
摘要
Writing medical image reports is an inefficient and time-consuming task for doctors. Automatically generating medical reports is an essential task of medical data mining, which can alleviate the workload of doctors and improve the standardization of reports. However, the existing methods mainly adopt the CNN-RNN structure to align image features with text features. This structure has difficulty dealing with the dependencies between distant text locations, leading to inconsistent context and semantics in the generated report. In this paper, we propose a knowledge-driven transformer (KdTNet) model for generating coherent medical reports. First, the visual grid and graph convolutional modules are devised to extract fine-grained visual features. Second, we adopt the transformer decoder to generate the hidden semantic states. Subsequently, a BERT-based auxiliary language module is employed to obtain the context language features of reports from the pre-defined medical term knowledge. We design a multimodal information fusion module to adaptively calculate the contribution of visual and linguistic features to the report for report generation. Extensive experiments on two real datasets explicate that our KdTNet model has achieved superior performance in captioning metrics and human evaluation compared with the state-of-the-art methods.KeywordsMedical data miningMedical report generationTransformer model
科研通智能强力驱动
Strongly Powered by AbleSci AI