数据流挖掘
计算机科学
缺少数据
数据挖掘
循环神经网络
人工神经网络
时态数据库
深度学习
均方误差
数据流
人工智能
机器学习
统计
数学
电信
作者
Jinsung Yoon,William R. Zame,Mihaela van der Schaar
标识
DOI:10.1109/tbme.2018.2874712
摘要
Missing data is a ubiquitous problem. It is especially challenging in medical settings because many streams of measurements are collected at different-and often irregular-times. Accurate estimation of the missing measurements is critical for many reasons, including diagnosis, prognosis, and treatment. Existing methods address this estimation problem by interpolating within data streams or imputing across data streams (both of which ignore important information) or ignoring the temporal aspect of the data and imposing strong assumptions about the nature of the data-generating process and/or the pattern of missing data (both of which are especially problematic for medical data). We propose a new approach, based on a novel deep learning architecture that we call a Multi-directional Recurrent Neural Network that interpolates within data streams and imputes across data streams. We demonstrate the power of our approach by applying it to five real-world medical datasets. We show that it provides dramatically improved estimation of missing measurements in comparison to 11 state-of-the-art benchmarks (including Spline and Cubic Interpolations, MICE, MissForest, matrix completion, and several RNN methods); typical improvements in Root Mean Squared Error are between 35%-50%. Additional experiments based on the same five datasets demonstrate that the improvements provided by our method are extremely robust.
科研通智能强力驱动
Strongly Powered by AbleSci AI