计算机科学
人工智能
图像(数学)
编码器
模式识别(心理学)
班级(哲学)
医学影像学
相似性(几何)
自然语言处理
机器学习
操作系统
作者
Yuhao Zhang,Hang Jiang,Yasuhide Miura,Christopher D. Manning,Curtis P. Langlotz
出处
期刊:Cornell University - arXiv
日期:2020-01-01
被引量:225
标识
DOI:10.48550/arxiv.2010.00747
摘要
Learning visual representations of medical images (e.g., X-rays) is core to medical image understanding but its progress has been held back by the scarcity of human annotations. Existing work commonly relies on fine-tuning weights transferred from ImageNet pretraining, which is suboptimal due to drastically different image characteristics, or rule-based label extraction from the textual report data paired with medical images, which is inaccurate and hard to generalize. Meanwhile, several recent studies show exciting results from unsupervised contrastive learning from natural images, but we find these methods help little on medical images because of their high inter-class similarity. We propose ConVIRT, an alternative unsupervised strategy to learn medical visual representations by exploiting naturally occurring paired descriptive text. Our new method of pretraining medical image encoders with the paired text data via a bidirectional contrastive objective between the two modalities is domain-agnostic, and requires no additional expert input. We test ConVIRT by transferring our pretrained weights to 4 medical image classification tasks and 2 zero-shot retrieval tasks, and show that it leads to image representations that considerably outperform strong baselines in most settings. Notably, in all 4 classification tasks, our method requires only 10\% as much labeled training data as an ImageNet initialized counterpart to achieve better or comparable performance, demonstrating superior data efficiency.
科研通智能强力驱动
Strongly Powered by AbleSci AI