计算机科学
特征选择
离群值
稳健性(进化)
Lasso(编程语言)
模式识别(心理学)
噪音(视频)
人工智能
线性回归
弹性网正则化
特征(语言学)
回归
回归分析
机器学习
统计
数学
图像(数学)
万维网
哲学
基因
生物化学
化学
语言学
作者
Yaqing Guo,Wenjian Wang,Xuejun Wang
标识
DOI:10.1109/tkde.2021.3076891
摘要
The linear regression model is simple in form and easy to estimate; nevertheless, irrelevant features will raise the difficulty of its tasks. Feature selection is generally adopted to improve the model performance. Unfortunately, traditional regression feature selection methods may not work for data with noise or outliers. Although some robust methods for certain specific error distributions have been proposed, they may not perform well because the distribution of representation error is often unknown for real data. This paper proposes a regression feature selection method for unknown noise named Mixture of Gaussians LASSO (MoG-LASSO), in which feature selection and model training will be achieved simultaneously. MoG is adopted to model unknown noises, and M-estimation is used to acquire the weighted squared error loss. By alternatively and iteratively updating the regression coefficient and parameters of MoG, the influence of unknown noise can be reduced effectively. Furthermore, MoG-LASSO achieves feature selection by the $L_{1}$ regularization term, which can further improve the performance of the model. Experimental results on artificial data and benchmark data sets demonstrate that MoG-LASSO has better robustness and sparsity for data sets with irrelevant features. Additionally, experimental results on face recognition databases show the performance advantage of MoG-LASSO over state-of-the-art methods in the presence of illumination variations.
科研通智能强力驱动
Strongly Powered by AbleSci AI