插补(统计学)
随机森林
缺少数据
计算机科学
过采样
数据挖掘
特征选择
数据类型
朴素贝叶斯分类器
人工智能
统计
机器学习
数学
支持向量机
计算机网络
程序设计语言
带宽(计算)
作者
Ren Lijuan,Aicha Sekhari Seklouli,Haiqing Zhang,Tao Wang,Abdelaziz Bouras
标识
DOI:10.1016/j.is.2022.102122
摘要
As the application of information technology in the medical field is resulting in a large amount of medical data. As early withdrawal and refusal of participants, there are a lot of missing values in medical data. Although various processing methods for missing values have been proposed, few methods for those medical data with characteristics of imbalance and mixed-type data. In this work, we proposed an adaptive Laplacian weight random forest, called ALWRF. In ALWRF, feature weights were adjusted dynamically when model constructing, which increases selection probabilities of features with low Laplacian score and high importance. Meanwhile, a random operator is introduced to increase the diversity of trees. Furthermore, we proposed an imputation method based on SMOTE-NC oversampling technology and the ALWRF method for imbalanced and mixed-type data, called SncALWRFI. Meanwhile, Bayesian optimization and cross-validation were employed to search optimal parameters. The experimental results showed that the ALWRF method outperforms random forest and Bayesian optimized random forest in terms of classification and regression accuracy. Further, in the experiment for missing values, the SncALWRFI showed the best imputation accuracy, and it performed high imputation effectiveness in public datasets with characteristics of imbalanced and mixed-type.
科研通智能强力驱动
Strongly Powered by AbleSci AI