计算机科学
人工智能
语义学(计算机科学)
相似性(几何)
统一医学语言系统
自然语言处理
分类
领域(数学)
上下文图像分类
图像检索
自然语言
图像(数学)
模式识别(心理学)
数学
纯数学
程序设计语言
作者
Bo Liu,Donghuan Lu,Wei Dong,Xian Wang,Yan Wang,Yongfeng Zhang,Yufeng Zheng
标识
DOI:10.1109/tmi.2023.3294980
摘要
Medical contrastive vision-language pretraining has shown great promise in many downstream tasks, such as data-efficient/zero-shot recognition. Current studies pretrain the network with contrastive loss by treating the paired image-reports as positive samples and the unpaired ones as negative samples. However, unlike natural datasets, many medical images or reports from different cases could have large similarity especially for the normal cases, and treating all the unpaired ones as negative samples could undermine the learned semantic structure and impose an adverse effect on the representations. Therefore, we design a simple yet effective approach for better contrastive learning in medical vision-language field. Specifically, by simplifying the computation of similarity between medical image-report pairs into the calculation of the inter-report similarity, the image-report tuples are divided into positive, negative, and additional neutral groups. With this better categorization of samples, more suitable contrastive loss is constructed. For evaluation, we perform extensive experiments by applying the proposed model-agnostic strategy to two state-of-the-art pretraining frameworks. The consistent improvements on four common downstream tasks, including cross-modal retrieval, zero-shot/data-efficient image classification, and image segmentation, demonstrate the effectiveness of the proposed strategy in medical field.
科研通智能强力驱动
Strongly Powered by AbleSci AI