缺少数据
计算机科学
插补(统计学)
数据挖掘
机器学习
作者
Janani Venugopalan,Nikhil K. Chanani,Kevin Maher,May D. Wang
标识
DOI:10.1109/jbhi.2018.2883606
摘要
The diversity and number of parameters monitored in an intensive care unit (ICU) make the resulting databases highly susceptible to quality issues, such as missing information and erroneous data entry, which adversely affect the downstream processing and predictive modeling. Missing data interpolation and imputation techniques, such as multiple imputation, expectation maximization, and hot-deck imputation techniques do not account for the type of missing data, which can lead to bias. In our study, we first model the missing data as three types: "neglectable" also known as a.k.a "missing completely at random," "recoverable" a.k.a. "missing at random," and "not easily recoverable" a.k.a. "missing not at random." We then design imputation techniques for each type of missing data. We use a publicly available database (MIMIC II) to demonstrate how these imputations perform with random forests for prediction. Our results indicate that these novel imputation techniques outperformed standard mean filling techniques and expectation maximization with a statistical significance p ≤ 0.01 in predicting ICU mortality.
科研通智能强力驱动
Strongly Powered by AbleSci AI