肺癌
一般化
病态的
人工智能
医学
计算机科学
结核(地质)
肺
模式识别(心理学)
放射科
机器学习
病理
数学
内科学
数学分析
古生物学
生物
作者
Yanbo Shao,Minghao Wang,Juanyun Mai,Xinliang Fu,Mei Li,Jiayin Zheng,Zhaoqi Diao,Airu Yin,Yulong Chen,Jianyu Xiao,Jian You,Yang Yang,Xiangcheng Qiu,Jinsheng Tao,Bo Wang,Hua Ji
标识
DOI:10.1007/978-3-031-16437-8_74
摘要
Lung cancer has been one of the greatest lethal cancers worldwide. Computed Tomograph (CT) makes it possible to diagnose lung cancer at an early stage, which can significantly reduce its mortality. In recent years, deep neural networks (DNN) have been widely used to improve the accuracy of benign and malignant pulmonary nodules classification. But the limitation of DNN approach is that AI model's performance and generalization highly depend on the size and quality of the training data. With our best knowledge, almost all existing public lung nodule datasets, e.g., LIDC-IDRI, obtain the crucial benign and malignant labels by radiographic analysis, instead of pathological examination. In this paper, we argue that, without pathology report and hence lack of labels' authenticity, LIDC-IDRI based machine-learning (ML) models are short of generalization. To prove our hypothesis, we introduce a new lung CT image dataset with pathological information (LIDP), for lung cancer screening. LIDP contains 990 samples, including 783 malignant samples and 207 benign samples. More critically, the labels of all samples have been all examined by pathological biopsy. We evaluate various of existing LIDC-based state-of-the-art (SOTA) models on LIDP. Our experimental results show the extreme poor generalization ability of existing SOTA models that are trained on LIDC-IDRI dataset. Our scientific conclusion is striking: the distributions of these datasets are significantly different. We claim that the LIDP dataset is a very valuable addition to the existing datasets like LIDC-IDRI. LIDP can be well used for independent testing or for training new ML models for lung cancer early detection.
科研通智能强力驱动
Strongly Powered by AbleSci AI