计算机科学
插补(统计学)
缺少数据
推论
高斯分布
参数统计
全球定位系统
数据挖掘
人工智能
统计
机器学习
数学
量子力学
电信
物理
作者
Bahram Jafrasteh,Daniel Hernández-Lobato,Simón Pedro Lubián‐López,Isabel Benavente‐Fernández
标识
DOI:10.1016/j.knosys.2023.110603
摘要
A missing value indicates that a particular attribute of an instance of a learning problem is not recorded. They are very common in many real-life datasets. In spite of this, however, most machine learning methods cannot handle missing values. Thus, they should be imputed before training. Gaussian Processes (GPs) are non-parametric models with accurate uncertainty estimates that combined with sparse approximations and stochastic variational inference scale to large data sets. Sparse GPs (SGPs) can be used to get a predictive distribution for missing values. We present a hierarchical composition of sparse GPs that is used to predict the missing values at each dimension using the observed values from the other dimensions. Importantly, we consider that the input attributes to each sparse GP used for prediction may also have missing values. The missing values in those input attributes are replaced by the predictions of the previous sparse GPs in the hierarchy. We call our approach missing GP (MGP). MGP can impute all observed missing values. It outputs a predictive distribution for each missing value that is then used in the imputation of other missing values. We evaluate MGP on one private clinical data set and on four UCI datasets with a different percentage of missing values. Furthermore, we compare the performance of MGP with other state-of-the-art methods for imputing missing values, including variants based on sparse GPs and deep GPs. Our results show that the performance of MGP is significantly better.
科研通智能强力驱动
Strongly Powered by AbleSci AI