插补(统计学)
缺少数据
计算机科学
标杆管理
数据挖掘
机器学习
线性插值
人工智能
插值(计算机图形学)
线性模型
稳健性(进化)
深度学习
集成学习
数据质量
数据类型
学习网络
作者
Michael Poette,Sandrine Mouysset,Daniel Ruíz,Vincent Pey,Jean-Marc Alliot,Vincent Minville
标识
DOI:10.1038/s41598-026-39035-z
摘要
Handling missing data remains a central challenge in Intensive Care Units (ICU) time-series analysis, where gaps frequently arise from non-random mechanisms such as sensor disconnections and workflow-driven interruptions. In this study, we benchmarked multiple imputation strategies on monitoring data from MIMIC-IV and designed masking scenarios that reflect ICU missingness patterns observed in the database, thereby approximating real-world conditions and clarifying how conclusions depend on both the chosen imputation method and the missingness scenario. We compared commonly used simple statistical approaches (mean, LOCF, interpolation), classical machine learning techniques (MICE, MissForest), and several deep learning architectures (Transformers, RNNs, GANs, VAEs). Transformer and GAN models achieved the best overall performance, whereas linear interpolation remained a strong baseline. Crucially, results were scenario-dependent: MCAR produced optimistic error estimates and compressed differences between methods, whereas structured gaps revealed clearer performance separations. Our findings suggest that, while deep learning methods improve overall imputation accuracy, linear interpolation is often nearly as effective and offers a lighter, more interpretable approach. This work introduces a practical framework for evaluating time-series imputation strategies under realistic constraints, with a focus on clinical relevance. Further analysis of downstream impact under clinically realistic scenarios and using tailored imputation strategies by variable type remains needed.
科研通智能强力驱动
Strongly Powered by AbleSci AI