逻辑回归
医学
机器学习
聚类分析
人工智能
随机森林
排名(信息检索)
梯度升压
人口
计算机科学
环境卫生
作者
Xiyue Liao,David Kerr,Jessikah Morales,Ian Duncan
标识
DOI:10.1089/dia.2018.0390
摘要
Aims: The aim of this study is to compare some machine learning methods with traditional statistical parametric analyses using logistic regression to investigate the relationship of risk factors for diabetes and cardiovascular (cardiometabolic risk) for U.S. adults using a cross-sectional data from participants in a wellness improvement program. Methods: Logistic regression was used to find the relationship between individual risk factors, predictor and cardiometabolic risk. Supervised machine learning methods were used to predict risk and produce a ranking of variables' importance. A clustering method was used to identify subpopulations of interest. Predictors were divided into those that are nonmodifiable and those that are modifiable. Results: The population comprised 217,254 adults of whom 8.1% had diabetes. Using logistic regression, six variables were identified to be negatively related and eleven were positively related to cardiometabolic risk. Three supervised machine learning classifiers (random forest, gradient boosting, and bagging) were applied with average AUC to be 0.806. Each classifier also produced a ranking of variables' importance. Four subgroups were identified with a k-medoid clustering algorithm, which were mainly distinguished by gender and diabetes status. Conclusions: The study illustrates that machine learning is an important addition to traditional logistic regression in terms of identifying important cardiometabolic risk factors and ranking their importance and the potential for interventions based on lifestyle and medications at an individual level.
科研通智能强力驱动
Strongly Powered by AbleSci AI