人工智能
接收机工作特性
线性判别分析
计算机科学
重采样
算法
数学
模式识别(心理学)
机器学习
人口
统计
医学
环境卫生
作者
Frederik Christensen,Deniz Kenan Kılıç,Izabela Nielsen,Tarec Christoffer El‐Galaly,Andreas Glenthøj,Jens Helby,Henrik Frederiksen,Sören Möller,Alexander Djupnes Fuglkjær
标识
DOI:10.1016/j.cmpb.2024.108581
摘要
Around 7% of the global population has congenital hemoglobin disorders, with over 300,000 new cases of α-thalassemia annually. Diagnosis is costly and inaccurate in low-income regions, often relying on complete blood count (CBC) tests. This study employs machine learning (ML) to classify α-thalassemia traits based on gender and CBC, exploring the effects of grouping silent- and non-carriers. The dataset includes 288 individuals with suspected α-thalassemia from Sri Lanka. It was classified using eleven discriminant formulae and nine ML models. Outliers were removed using Mahalanobis distance, and resampling was conducted with the synthetic minority oversampling technique (SMOTE) and SMOTE-nominal continuous (NC). The Mann-Whitney U test handled feature extraction and class grouping. ML performance was evaluated with eight criteria. The Ehsani formula achieved an area under the receiver operating characteristic curve (ROC-AUC) of 0.66 by grouping silent- and non-carriers. The convolutional neural network (CNN) without feature extraction demonstrated better performance, with an accuracy of 0.85, sensitivity of 0.8, specificity of 0.86, and ROC-AUC of 0.95/0.93 (micro/macro). Performance was maintained even without preprocessing. ML models outperformed classical discriminant formulae in classifying α-thalassemia using sex and CBC features. A larger dataset could enhance ML model generalization and the impact of feature extraction. Grouping silent- and non-carriers improved ML results, especially with resampling. The silent carriers were not separable from non-carriers regarding the available features.
科研通智能强力驱动
Strongly Powered by AbleSci AI