A survey of medical image captioning technique: encoding, decoding and latest advance

计算机科学 隐藏字幕 循环神经网络 人工智能 卷积神经网络 深度学习 编码器 解码方法 背景(考古学) 编码(内存) 特征向量 任务(项目管理) 模式识别(心理学) 人工神经网络 计算机视觉 图像(数学) 算法 古生物学 生物 操作系统 管理 经济
作者
Yi Zhu,Xiu Li
出处
期刊:Journal of Image and Graphics [University of Portsmouth]
卷期号:28 (7): 1990-2010 被引量:1
标识
DOI:10.11834/jig.211021
摘要

随着医疗成像技术的不断提升,放射科医师每天要撰写的医学报告也与日俱增。深度学习兴起后,基于深度学习的医学图像描述技术用于自动生成医学报告,取得了显著效果。本文全面整理了近年来深度医学图像描述方向的论文,包括这一领域的最新方法、数据集和评价指标,分析了它们各自的优劣,并以模型结构为线索予以介绍,是国内首篇针对医疗图像描述任务的综述。现今的深度医疗图像描述技术主要以编码器—解码器结构为基础进行拓展,包括但不局限于加入检索方法、模板匹配方法、注意力机制、强化学习和知识图谱等方法。检索和模板匹配方法虽然简单,但由于医学报告的特殊性仍在本任务上有不错的效果;注意力机制使模型产生报告时能关注图像和文本的某一部分,已经被几乎所有主流模型所采用;强化学习方法突破了医疗图像描述任务中梯度下降训练法与离散的语言生成评价指标不匹配的瓶颈;知识图谱方法则融合了人类医生对于疾病的先验知识,有效提高了生成报告的临床准确性。此外,Transformer等新型结构也正越来越多地取代循环神经网络(recurrent neural network,RNN)甚至卷积神经网络(convolutional neural network,CNN)的位置成为网络主干。本文最后讨论了目前深度医疗图像描述仍需解决的问题以及未来的研究方向,希望能推动深度医疗图像描述技术真正落地。;Medical image captioning is a labor-intensive daily task for radiologists nowadays. The emerging deep medical image captioning technique has its potential to generate medical captions automatically. There are some challenges to be resolved as mentioned below:1)to organize a feasible and clear structure to readers;2)to strengthen deep medical image caption task itself;3)to optimize the introduced methods. First,the aims and objectives are identified. Then,literature is reviewed for the growth of deep medical image caption till 2021,including their latest methods,datasets and evaluation metrics,and comparative analysis between medical image caption task and generic image caption task. Deep image caption technique is introduced on the basis of prior network structure. Current deep medical image caption technique is mainly developed in terms of the encoder-decoder structure,such as adding retrieval-based methods,template matching based methods,attention mechanisms,reinforcement learning,and knowledge graphs. Specifically,the encoder-decoder structure can be integrated into convolutional neural network(CNN)for image feature extraction and recurrent neural network (RNN)for caption generation,and the two kind of networks are linked by an intermediate vector,called context vector. Such models are based on CNN-RNN-RNN structure,called hierarchical RNN or long short-term memory(LSTM). This structure allows two sort of RNNs to be stacked together,which can generate its thematic vector and captions,and the caption is generated and supervised by the theme vector. The feature of the medical captions can be recognized in relevance to high ratio of repetition and special sentence patterns although the retrieval-based and template-matched methods are still relatively simple. The attention mechanism can be used for a certain part of the image and sentence when the caption is generated and the length of the contextual vector becomes variable. Medical image caption task-oriented reinforcement learning(RL)can be used to alleviate the mismatch problem between the gradient descent training method and the discrete language generation evaluation metric as well. RL can also work as multi-agent to guide the decoder in the form of output before the decoder works,and it can output well-balanced and logical medical contents. Knowledge graph can integrate the prior knowledge of expertise into the model,and diseases having similar features will be in closer nodes in the graph where the disease information can be updated through graph convolution. The integration of medical knowledge graph is focused on improving the clinical accuracy of the generated report effectively. These methods are compatible for each other like template matching based method and attention mechanism based RL can be used simultaneously. In addition,Transformerrelated structures have been developing intensively as the new backbone network beyond RNN and CNN. Transformer or the self-attention block can be trained in parallel,and it can capture the long-distance reliance between tokens,which serves as a better feature extractor. Popular datasets in deep medical image caption are IU X-Ray and MIMIC-CXR,in which frontal and lateral X-Ray images of chest and multiple sentences melted into a single report. Medical annotations like medical subject headings(MeSH)or unified medical language system(UMLS)keywords are beneficial to generate more accurate reports as they can be treated as extra information,and the classification of these tags can be seen as a pretraining task. Generic natural language generation metrics are applied to evaluate the report generated by deep medical image caption models. New metrics like SPICE,SPIDEr and BERTSCORE have been developing beyond existing BLEU-n, ROUGE,METEOR and CIDEr scores. Finally,future research directions are predicted on the four aspects:1)more diverse and more accurate datasets,such as other related modalities like magnetic resonance imaging(MRI)and color Doppler ultrasound. The model can be more robust and adaptive to various tasks in this way because current datasets mostly focus on chest X-Ray photos,which is limited to a single body part and a single modality. 2)Evaluation metrics can be more accurate and cost-effective in clinical beyond BLEU or ROUGE scores-related generic natural language generation metrics. The manpower of radiologists can be optimized while existing generic NLG metrics are not the best evaluation in medicine. 3)Unsupervised and semi-supervised methods can be used to lower dataset-relevant cost for the medical image captioning task. The cost and training samples can be optimized based on the existing pre-training models like ViLBERT and VL-BERT. 4)More prior knowledge can be integrated into the model for the medical image captioning task and multiround conversational medical report generation can be more detailed.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
我是老大应助谷雨秋采纳,获得10
刚刚
1秒前
独特雁枫发布了新的文献求助10
1秒前
Gauss应助科研通管家采纳,获得30
1秒前
pluto应助科研通管家采纳,获得10
1秒前
1秒前
1秒前
berkelerey12138完成签到,获得积分10
2秒前
aaa完成签到,获得积分20
2秒前
Ava应助舒适路人采纳,获得10
2秒前
76完成签到,获得积分10
3秒前
5秒前
朴素采文完成签到,获得积分10
5秒前
Paddi完成签到,获得积分10
6秒前
biyingxuan发布了新的文献求助10
7秒前
liuqiuchina完成签到,获得积分10
8秒前
9秒前
橘子完成签到 ,获得积分10
10秒前
10秒前
新明发布了新的文献求助30
10秒前
10秒前
pw完成签到 ,获得积分10
10秒前
脑洞疼应助76采纳,获得10
11秒前
louis136116完成签到,获得积分10
12秒前
MAKEYF完成签到 ,获得积分10
12秒前
独特雁枫完成签到,获得积分10
13秒前
cdercder应助F7erxl采纳,获得10
13秒前
谷雨秋发布了新的文献求助10
13秒前
海北完成签到 ,获得积分10
13秒前
烟花应助lei029采纳,获得10
14秒前
852应助小L采纳,获得10
14秒前
大胆砖头应助舒适路人采纳,获得10
15秒前
Phyllis完成签到,获得积分10
15秒前
15秒前
怡然亿先发布了新的文献求助10
15秒前
15秒前
坦率夕阳完成签到,获得积分10
18秒前
18秒前
王大大发布了新的文献求助10
18秒前
大猪完成签到,获得积分10
19秒前
高分求助中
Les Mantodea de Guyane Insecta, Polyneoptera 2500
Technologies supporting mass customization of apparel: A pilot project 450
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
Brain and Heart The Triumphs and Struggles of a Pediatric Neurosurgeon 400
Cybersecurity Blueprint – Transitioning to Tech 400
Mixing the elements of mass customisation 400
Периодизация спортивной тренировки. Общая теория и её практическое применение 310
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3786018
求助须知:如何正确求助?哪些是违规求助? 3331550
关于积分的说明 10251498
捐赠科研通 3046914
什么是DOI,文献DOI怎么找? 1672269
邀请新用户注册赠送积分活动 801207
科研通“疑难数据库(出版商)”最低求助积分说明 760020