Deep learning versus conventional methods for missing data imputation: A review and comparative study

计算机科学 插补(统计学) 缺少数据 稳健性(进化) 深度学习 人工智能 嵌入 样本量测定 推论 机器学习 数据挖掘 统计 数学 生物化学 化学 基因
作者
Yige Sun,Jing Li,Yifan Xu,Tingting Zhang,Xiaofeng Wang
出处
期刊:Expert Systems With Applications [Elsevier BV]
卷期号:227: 120201-120201 被引量:173
标识
DOI:10.1016/j.eswa.2023.120201
摘要

Deep learning models have been recently proposed in the applications of missing data imputation. In this paper, we review the popular statistical, machine learning, and deep learning approaches, and discuss the advantages and disadvantages of these methods. We conduct a comprehensive numerical study to compare the performance of several widely-used imputation methods for incomplete tabular (structured) data. Specifically, we compare the deep learning methods: generative adversarial imputation networks (GAIN) with onehot encoding, GAIN with embedding, variational auto-encoder (VAE) with onehot encoding, and VAE with embedding versus two conventional methods: multiple imputation by chained equations (MICE) and missForest. Seven real benchmark datasets and three simulated datasets are considered, including various scenarios with different feature types under different levels of sample sizes. The missing data are generated based on different missing ratios and three kinds of missing mechanisms: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). Our experiments show that, for small or moderate sample sizes, the conventional methods establish better robustness and imputation performance than the deep learning methods. GAINs only perform well in the case of MCAR and often fail in the cases of MAR and MNAR. VAEs are easy to fall into mode collapse in all missing mechanisms. We conclude that the conventional methods, MICE and missForest, are preferable for practitioners to deal with missing data imputation for tabular data with a limited sample size (i.e., n<30,000) in real case analyses.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Nexus应助zhou采纳,获得10
刚刚
Biao完成签到,获得积分10
3秒前
机灵的衬衫完成签到 ,获得积分0
3秒前
QIN完成签到,获得积分20
3秒前
欧斌完成签到,获得积分10
4秒前
zhj关闭了zhj文献求助
4秒前
今后应助王晨光采纳,获得10
7秒前
YOMU完成签到,获得积分10
10秒前
10秒前
13秒前
40应助沐风采纳,获得20
13秒前
OOO完成签到 ,获得积分10
13秒前
星光发布了新的文献求助10
14秒前
14秒前
17秒前
believe杨发布了新的文献求助10
17秒前
orixero应助123采纳,获得10
17秒前
景莉莉完成签到,获得积分10
18秒前
pan完成签到,获得积分10
19秒前
科研通AI2S应助畅快的明杰采纳,获得10
21秒前
王晨光发布了新的文献求助10
21秒前
莫奈完成签到,获得积分10
22秒前
景莉莉发布了新的文献求助10
22秒前
yiersan完成签到,获得积分10
24秒前
believe杨完成签到,获得积分10
25秒前
26秒前
小二郎应助hdt采纳,获得10
28秒前
Ava应助合适的彤采纳,获得10
28秒前
悦耳语风完成签到,获得积分10
31秒前
秃头emo兔完成签到 ,获得积分10
32秒前
桐桐应助科研通管家采纳,获得10
32秒前
李健应助科研通管家采纳,获得10
32秒前
帅气书萱应助科研通管家采纳,获得10
32秒前
打打应助科研通管家采纳,获得10
32秒前
今后应助科研通管家采纳,获得10
32秒前
科研通AI2S应助科研通管家采纳,获得10
33秒前
隐形曼青应助科研通管家采纳,获得10
33秒前
Sea_U应助科研通管家采纳,获得10
33秒前
33秒前
魏故完成签到,获得积分10
34秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Applied Min-Max Approach to Missile Guidance and Control 5000
Metallurgy at high pressures and high temperatures 2000
Inorganic Chemistry Eighth Edition 1200
The Organic Chemistry of Biological Pathways Second Edition 1000
The Psychological Quest for Meaning 800
Signals, Systems, and Signal Processing 610
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6326670
求助须知:如何正确求助?哪些是违规求助? 8143408
关于积分的说明 17075145
捐赠科研通 5380287
什么是DOI,文献DOI怎么找? 2854388
邀请新用户注册赠送积分活动 1831959
关于科研通互助平台的介绍 1683204