缺少数据
聚类分析
粒子群优化
特征选择
初始化
计算机科学
兰德指数
数据挖掘
算法
模糊聚类
度量(数据仓库)
Bhattacharyya距离
特征向量
人工智能
模式识别(心理学)
机器学习
程序设计语言
作者
Zhang Yon,Wang Yan-hu,Dunwei Gong,Xiaoyan Sun
标识
DOI:10.1109/tevc.2021.3106975
摘要
Feature selection (FS) in data with class imbalance or missing values has received much attention from researchers due to their universality in real-world applications. However, for data with both the two characteristics above, there is still a lack of the corresponding FS algorithm. Due to the complex coupling relationship between missing data and class imbalance, the need for better FS method becomes essential. To tackle high-dimensional imbalanced data with missing values, this article studies a new evolutionary FS method. First, an improved $F$ -measure based on filling risk (RF-measure) is defined to evaluate the influence of missing data on the performance of FS in the case of class imbalance. Following that taking the RF-measure as an objective function, a particle swarm optimization-based FS method with fuzzy clustering (PSOFS-FC) is proposed. Two new problem-specific operators or strategies, i.e., the swarm initialization strategy guided by fuzzy clustering and the local pruning operator based on feature importance, are developed to improve the performance of PSOFS-FC. Compared with state-of-the-art FS algorithms on several public datasets, experimental results show that PSOFS-FC can achieve excellent classification performance with relatively less running time, indicating its superiority on tackling high-dimensional imbalanced data with missing values.
科研通智能强力驱动
Strongly Powered by AbleSci AI