过度拟合
特征选择
计算机科学
遗传程序设计
人工智能
特征(语言学)
维数之咒
模式识别(心理学)
最小冗余特征选择
数据挖掘
机器学习
降维
集合(抽象数据类型)
人工神经网络
哲学
语言学
程序设计语言
作者
Binh Tran,Bing Xue,Mengjie Zhang
出处
期刊:Memetic Computing
[Springer Science+Business Media]
日期:2015-12-19
卷期号:8 (1): 3-15
被引量:149
标识
DOI:10.1007/s12293-015-0173-y
摘要
Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be used for both feature construction and implicit feature selection. This work presents a comprehensive study to investigate the use of GP for feature construction and selection on high-dimensional classification problems. Different combinations of the constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used to evaluate their performance. The results show that the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or even increase the classification accuracy in most cases. The cases with overfitting occurred are analysed via the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI