计算机科学
过度拟合
人工智能
杠杆(统计)
特征(语言学)
注释
提取器
机器学习
特征学习
翻译
自然语言处理
模式识别(心理学)
人工神经网络
程序设计语言
工程类
哲学
语言学
工艺工程
作者
Bo Liu,Li-Ming Zhan,Xiao-Ming Wu
标识
DOI:10.1007/978-3-030-87196-3_20
摘要
One of the primary challenges facing medical visual question answering (Med-VQA) is the lack of large-scale well-annotated datasets for training. To overcome this challenge, this paper proposes a two-stage pre-training framework by learning transferable feature representations of radiology images and distilling a lightweight visual feature extractor for Med-VQA. Specifically, we leverage large amounts of unlabeled radiology images to train three teacher models for the body regions of brain, chest, and abdomen respectively via contrastive learning. Then, we distill the teacher models to a lightweight student model that can be used as a universal visual feature extractor for any Med-VQA system. The lightweight feature extractor can be readily fine-tuned on the training radiology images of any Med-VQA dataset, saving the annotation effort while preventing overfitting to small-scale training data. The effectiveness and advantages of the pre-trained model are demonstrated by extensive experiments with state-of-the-art Med-VQA methods on existing benchmarks. The source code and the pre-training dataset can be downloaded from https://github.com/awenbocc/cprd.
科研通智能强力驱动
Strongly Powered by AbleSci AI