插补(统计学)
贝叶斯信息准则
缺少数据
阿卡克信息准则
计算机科学
过度拟合
选型
推论
数据挖掘
统计
计量经济学
数学
人工智能
机器学习
人工神经网络
作者
Firouzeh Noghrehchi,Jakub Stoklosa,Spiridon Penev,David I. Warton
摘要
Multiple imputation and maximum likelihood estimation (via the expectation‐maximization algorithm) are two well‐known methods readily used for analyzing data with missing values. While these two methods are often considered as being distinct from one another, multiple imputation (when using improper imputation) is actually equivalent to a stochastic expectation‐maximization approximation to the likelihood. In this article, we exploit this key result to show that familiar likelihood‐based approaches to model selection, such as Akaike's information criterion (AIC) and the Bayesian information criterion (BIC), can be used to choose the imputation model that best fits the observed data. Poor choice of imputation model is known to bias inference, and while sensitivity analysis has often been used to explore the implications of different imputation models, we show that the data can be used to choose an appropriate imputation model via conventional model selection tools. We show that BIC can be consistent for selecting the correct imputation model in the presence of missing data. We verify these results empirically through simulation studies, and demonstrate their practicality on two classical missing data examples. An interesting result we saw in simulations was that not only can parameter estimates be biased by misspecifying the imputation model, but also by overfitting the imputation model. This emphasizes the importance of using model selection not just to choose the appropriate type of imputation model, but also to decide on the appropriate level of imputation model complexity.
科研通智能强力驱动
Strongly Powered by AbleSci AI