过度拟合
随机森林
支持向量机
人工智能
特征选择
计算机科学
机器学习
接收机工作特性
逻辑回归
模式识别(心理学)
人工神经网络
作者
Jianqiang Li,Lu Liu,Jingchao Sun,Haowen Mo,Ji-Jiang Yang,Shi Chen,Huiting Liu,Qing Wang,Hui Pan
标识
DOI:10.1109/tbdata.2016.2620981
摘要
Diagnosing infants who are small for gestational age (SGA) at early stages could help physicians to introduce interventions for SGA infants earlier. Machine learning (ML) is envisioned as a tool to identify SGA infants. However, ML has not been widely studied in this field. To develop effective SGA prediction models, we conducted four groups of experiments that considered basic ML methods, imbalanced data, feature selection and the time characteristics of variables, respectively. Infants with SGA data collected from 2010 to 2013 with gestational weeks between 24 and 42 were detected. Support vector machine (SVM), random forest (RF), logistic regression (LR) and Sparse LR models were trained on 10-fold cross validation. Precision and the area under the curve (AUC) of the receiver operator characteristic curve were evaluated. For each group, the performance of SVM and Sparse LR was similarly well. LR without any sparsity penalties performed worst, possibly caused by the overfitting problem. With the combination of handling imbalanced data and feature selection, the RF ensemble classifier performed best, which even obtained the highest AUC value (0.8547) with the help of expert knowledge. In other cases, RF performed worse than Sparse LR and SVM, possibly because of fully grown trees.
科研通智能强力驱动
Strongly Powered by AbleSci AI