LIDP: A Lung Image Dataset with Pathological Information for Lung Cancer Screening

肺癌 一般化 病态的 人工智能 医学 计算机科学 结核(地质) 模式识别(心理学) 放射科 机器学习 病理 数学 内科学 古生物学 数学分析 生物
作者
Yanbo Shao,Minghao Wang,Juanyun Mai,Xinliang Fu,Mei Li,Jiayin Zheng,Zhaoqi Diao,Airu Yin,Yulong Chen,Jianyu Xiao,Jian You,Yang Yang,Xiangcheng Qiu,Jinsheng Tao,Bo Wang,Hua Ji
出处
期刊:Lecture Notes in Computer Science 卷期号:: 770-779 被引量:2
标识
DOI:10.1007/978-3-031-16437-8_74
摘要

Lung cancer has been one of the greatest lethal cancers worldwide. Computed Tomograph (CT) makes it possible to diagnose lung cancer at an early stage, which can significantly reduce its mortality. In recent years, deep neural networks (DNN) have been widely used to improve the accuracy of benign and malignant pulmonary nodules classification. But the limitation of DNN approach is that AI model's performance and generalization highly depend on the size and quality of the training data. With our best knowledge, almost all existing public lung nodule datasets, e.g., LIDC-IDRI, obtain the crucial benign and malignant labels by radiographic analysis, instead of pathological examination. In this paper, we argue that, without pathology report and hence lack of labels' authenticity, LIDC-IDRI based machine-learning (ML) models are short of generalization. To prove our hypothesis, we introduce a new lung CT image dataset with pathological information (LIDP), for lung cancer screening. LIDP contains 990 samples, including 783 malignant samples and 207 benign samples. More critically, the labels of all samples have been all examined by pathological biopsy. We evaluate various of existing LIDC-based state-of-the-art (SOTA) models on LIDP. Our experimental results show the extreme poor generalization ability of existing SOTA models that are trained on LIDC-IDRI dataset. Our scientific conclusion is striking: the distributions of these datasets are significantly different. We claim that the LIDP dataset is a very valuable addition to the existing datasets like LIDC-IDRI. LIDP can be well used for independent testing or for training new ML models for lung cancer early detection.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
kelly完成签到,获得积分10
1秒前
山止川行完成签到,获得积分10
1秒前
1秒前
咸鱼咸完成签到,获得积分10
1秒前
精明人达发布了新的文献求助10
1秒前
2秒前
Biscuit完成签到 ,获得积分10
2秒前
cdy完成签到,获得积分10
2秒前
Twistzz完成签到,获得积分10
2秒前
3秒前
可爱半鬼完成签到 ,获得积分10
3秒前
哥哥应助徐硕采纳,获得30
3秒前
猩猩势力发布了新的文献求助10
3秒前
孤标傲世完成签到,获得积分10
3秒前
大个应助CH采纳,获得20
3秒前
wisdom应助乐观的冰之采纳,获得10
4秒前
lili完成签到,获得积分10
4秒前
蛋挞完成签到,获得积分10
4秒前
1AN完成签到,获得积分10
4秒前
研友_VZG7GZ应助cc采纳,获得10
4秒前
浮游应助Inightmare采纳,获得10
4秒前
汤姆完成签到,获得积分10
4秒前
顾矜应助林落采纳,获得10
5秒前
沈青田完成签到,获得积分10
5秒前
慈善家完成签到,获得积分10
5秒前
5秒前
牧青发布了新的文献求助10
5秒前
张张发布了新的文献求助10
6秒前
丘比特应助蔡源采纳,获得10
6秒前
7秒前
Owen应助现代的东蒽采纳,获得10
7秒前
MeiyanZou完成签到,获得积分10
7秒前
7秒前
结实的老虎完成签到,获得积分10
7秒前
Yu完成签到,获得积分10
7秒前
7秒前
bybyby发布了新的文献求助10
8秒前
快乐的鱼完成签到,获得积分10
8秒前
SciGPT应助玉梅采纳,获得20
8秒前
shy完成签到,获得积分10
8秒前
高分求助中
Adhesion Science: Principles & Practice 1234
Signals, Systems, and Signal Processing 610
Introduction to Cosmetic Formulation and Technology, 2nd Edition 400
Petrology and Plate Tectonics,2025 400
Burger's Medicinal Chemistry and Drug Discovery 400
Programming for Chemical Engineers Using C, C++, and MATLAB 320
Birth of Twins After Genome Editing for HIV Resistance 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6689883
求助须知:如何正确求助?哪些是违规求助? 8433551
关于积分的说明 18017834
捐赠科研通 5916436
什么是DOI,文献DOI怎么找? 2984440
邀请新用户注册赠送积分活动 1960446
关于科研通互助平台的介绍 1898853