计算机科学
过度拟合
概化理论
人工智能
预处理器
DICOM
机器学习
深度学习
查阅表格
数据挖掘
原始数据
像素
人工神经网络
程序设计语言
统计
数学
作者
Theo Dapamede,Frank Li,Bardia Khosravi,Saptarshi Purkayastha,Hari Trivedi,Judy Wawira Gichoya
标识
DOI:10.1007/s10278-025-01418-5
摘要
Image pre-processing has significant impact on performance of deep learning models in medicine; yet, there is no standardized method for DICOM pre-processing. In this study, we investigate the impact of two commonly used image preprocessing techniques, histogram equalization (HE) and values-of-interest look-up-table (VOI-LUT) transformations on the performance deep learning classifiers for chest X-rays (CXR). We generated two baseline datasets (raw pixel and standard DICOM processed) from our internal CXR dataset and then enhanced both with HE to create four distinct datasets. Four independent deep learning models for diagnosis of pneumothorax were trained and evaluated on two external datasets. Results reveal that HE enhancement significantly affects model performance, particularly in terms of generalizability. Models trained solely on HE-enhanced datasets exhibit poorer performance on external validation sets, suggesting potential overfitting and information loss. These models also exhibit shortcut learning, relying on spurious correlations in the training data for their prediction. This study highlights the importance of machine learning practitioners being aware of preprocessing techniques applied to datasets and their potential impacts on model performance, as well as need for including preprocessing information when sharing datasets. Additionally, this research underscores the necessity of using pixel values closer to clinical standards during dataset curation to improve model robustness and mitigate the risk of information loss.
科研通智能强力驱动
Strongly Powered by AbleSci AI