Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation.

计算机科学 随机森林 支持向量机 内科学 健康档案 健康信息学 大数据 临床决策支持系统 特征选择 数据挖掘 心脏病学 疾病 重症监护医学 决策树
作者
Zhenzhen Du,Yujie Yang,Jing Zheng,Qi Li,Denan Lin,Ye Li,Jianping Fan,Wen Cheng,Xie-Hui Chen,Yunpeng Cai
出处
期刊:JMIR medical informatics [JMIR Publications Inc.]
卷期号:8 (7) 被引量:9
标识
DOI:10.2196/17257
摘要

Background: Predictions of cardiovascular disease risks based on health records have long attracted broad research interests. Despite extensive efforts, the prediction accuracy has remained unsatisfactory. This raises the question as to whether the data insufficiency, statistical and machine-learning methods, or intrinsic noise have hindered the performance of previous approaches, and how these issues can be alleviated. Objective: Based on a large population of patients with hypertension in Shenzhen, China, we aimed to establish a high-precision coronary heart disease (CHD) prediction model through big data and machine-learning Methods: Data from a large cohort of 42,676 patients with hypertension, including 20,156 patients with CHD onset, were investigated from electronic health records (EHRs) 1-3 years prior to CHD onset (for CHD-positive cases) or during a disease-free follow-up period of more than 3 years (for CHD-negative cases). The population was divided evenly into independent training and test datasets. Various machine-learning methods were adopted on the training set to achieve high-accuracy prediction models and the results were compared with traditional statistical methods and well-known risk scales. Comparison analyses were performed to investigate the effects of training sample size, factor sets, and modeling approaches on the prediction performance. Results: An ensemble method, XGBoost, achieved high accuracy in predicting 3-year CHD onset for the independent test dataset with an area under the receiver operating characteristic curve (AUC) value of 0.943. Comparison analysis showed that nonlinear models (K-nearest neighbor AUC 0.908, random forest AUC 0.938) outperform linear models (logistic regression AUC 0.865) on the same datasets, and machine-learning methods significantly surpassed traditional risk scales or fixed models (eg, Framingham cardiovascular disease risk models). Further analyses revealed that using time-dependent features obtained from multiple records, including both statistical variables and changing-trend variables, helped to improve the performance compared to using only static features. Subpopulation analysis showed that the impact of feature design had a more significant effect on model accuracy than the population size. Marginal effect analysis showed that both traditional and EHR factors exhibited highly nonlinear characteristics with respect to the risk scores. Conclusions: We demonstrated that accurate risk prediction of CHD from EHRs is possible given a sufficiently large population of training data. Sophisticated machine-learning methods played an important role in tackling the heterogeneity and nonlinear nature of disease prediction. Moreover, accumulated EHR data over multiple time points provided additional features that were valuable for risk prediction. Our study highlights the importance of accumulating big data from EHRs for accurate disease predictions.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
建议保存本图,每天支付宝扫一扫(相册选取)领红包
实时播报
羊白玉完成签到 ,获得积分10
刚刚
2秒前
3秒前
wjx发布了新的文献求助10
3秒前
3秒前
4秒前
4秒前
SciGPT应助调皮的蓝天采纳,获得10
5秒前
5秒前
赵亚南发布了新的文献求助10
5秒前
张乐完成签到,获得积分20
6秒前
yangyong完成签到,获得积分10
6秒前
7秒前
8秒前
研友_LXOWx8发布了新的文献求助10
9秒前
Owen应助端庄的晓兰采纳,获得10
10秒前
10秒前
BH382298522完成签到,获得积分10
11秒前
11秒前
pengvvvv完成签到,获得积分10
11秒前
科研yuan小白完成签到,获得积分10
11秒前
JJdoudizhu完成签到,获得积分10
11秒前
12秒前
隐形曼青应助潇洒的冰烟采纳,获得10
13秒前
给我好好读书完成签到,获得积分10
13秒前
JJdoudizhu发布了新的文献求助10
14秒前
热电发布了新的文献求助10
15秒前
CipherSage应助西西采纳,获得10
16秒前
研友_LXOWx8完成签到,获得积分10
16秒前
混世暖暖小太阳完成签到,获得积分10
18秒前
18秒前
Owen应助科研小辣椒采纳,获得10
19秒前
传奇3应助爬山虎采纳,获得10
19秒前
20秒前
搜集达人应助科研通管家采纳,获得10
20秒前
赘婿应助科研通管家采纳,获得10
20秒前
沉静篮球完成签到 ,获得积分10
21秒前
725完成签到,获得积分20
21秒前
21秒前
21秒前
高分求助中
【重要提醒】请驳回机器人应助,等待人工应助!!!! 20000
Teaching Social and Emotional Learning in Physical Education 1000
Multifunctionality Agriculture: A New Paradigm for European Agriculture and Rural Development 500
grouting procedures for ground source heat pump 500
A Monograph of the Colubrid Snakes of the Genus Elaphe 300
An Annotated Checklist of Dinosaur Species by Continent 300
The Chemistry of Carbonyl Compounds and Derivatives 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2340290
求助须知:如何正确求助?哪些是违规求助? 2032290
关于积分的说明 5083518
捐赠科研通 1777316
什么是DOI,文献DOI怎么找? 888794
版权声明 556104
科研通“疑难数据库(出版商)”最低求助积分说明 473956