计算机科学
人工智能
图像(数学)
模式识别(心理学)
合成数据
领域(数学分析)
机器学习
数学
数学分析
作者
Pedro Osorio,Guillermo Jiménez-Pérez,Javier Montalt‐Tordera,Jens Hooge,Guillem Duran-Ballester,Shivam Singh,Moritz Radbruch,Udo Bach,Sabrina Schroeder,Krystyna Siudak,Julia Vienenkoetter,Bettina Lawrenz,Sadegh Mohammadi
出处
期刊:Cornell University - arXiv
日期:2023-01-01
标识
DOI:10.48550/arxiv.2312.09792
摘要
Artificial Intelligence (AI) based image analysis has an immense potential to support diagnostic histopathology, including cancer diagnostics. However, developing supervised AI methods requires large-scale annotated datasets. A potentially powerful solution is to augment training data with synthetic data. Latent diffusion models, which can generate high-quality, diverse synthetic images, are promising. However, the most common implementations rely on detailed textual descriptions, which are not generally available in this domain. This work proposes a method that constructs structured textual prompts from automatically extracted image features. We experiment with the PCam dataset, composed of tissue patches only loosely annotated as healthy or cancerous. We show that including image-derived features in the prompt, as opposed to only healthy and cancerous labels, improves the Fr\'echet Inception Distance (FID) from 178.8 to 90.2. We also show that pathologists find it challenging to detect synthetic images, with a median sensitivity/specificity of 0.55/0.55. Finally, we show that synthetic data effectively trains AI models.
科研通智能强力驱动
Strongly Powered by AbleSci AI