随机森林
特征选择
计算机科学
模式识别(心理学)
支持向量机
人工智能
分类器(UML)
数据挖掘
特征向量
特征(语言学)
冗余(工程)
哲学
操作系统
语言学
作者
Dengju Yao,Jing Yang,Xiaojuan Zhan,Xiaorong Zhan,Zhiqiang Xie
标识
DOI:10.1504/ijdmb.2015.070852
摘要
High-dimensional data and a large number of redundancy features in bioinformatics research have created an urgent need for feature selection. In this paper, a novel random forests-based feature selection method is proposed that adopts the idea of stratifying feature space and combines generalised sequence backward searching and generalised sequence forward searching strategies. A random forest variable importance score is used to rank features, and different classifiers are used as a feature subset evaluating function. The proposed method is examined on five microarray expression datasets, including leukaemia, prostate, breast, nervous and DLBCL, and the average accuracies of the SVM classifier in these datasets are 100%, 95.24%, 85%, 91.67%, and 91.67%, respectively. The results show that the proposed method could not only improve the classification accuracy but also greatly reduce the computation time of the feature selection process.
科研通智能强力驱动
Strongly Powered by AbleSci AI