特征选择
粒子群优化
计算机科学
特征(语言学)
聚类分析
元启发式
多群优化
二进制数
模式识别(心理学)
算法
人工智能
群体行为
数据挖掘
数学
算术
哲学
语言学
作者
Guicheng Yang,Wei Li,Weidong Xie,Linjie Wang,Kun Yu
标识
DOI:10.1016/j.cmpb.2023.107987
摘要
The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems. In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy. We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm. The hybrid feature selection method proposed in this paper helps address the issue of high-dimensional microarray data with few samples. It can select a small subset of features and achieve high classification accuracy on microarray datasets. Additionally, independent validation of the selected features shows that those chosen by C-IFBPFE have strong correlations with disease phenotypes and can identify important biomarkers from data related to biomedical problems.
科研通智能强力驱动
Strongly Powered by AbleSci AI