特征选择
计算机科学
元启发式
数据挖掘
相互信息
最大化
分类器(UML)
特征(语言学)
人工智能
冗余(工程)
机器学习
算法
模式识别(心理学)
数学优化
数学
哲学
操作系统
语言学
作者
Anurag Tiwari,Amrita Chaturvedi
标识
DOI:10.1016/j.eswa.2022.116621
摘要
The ubiquitous usage of feature selection in search space optimization, information retrieval, data mining, signal processing, software fault prediction, and bioinformatics is paramount to expert and intelligent systems. Most of the conventional feature selection methods implemented are based on filter and wrapper approaches that suffer from poor classification accuracy, high computational cost, and selection of irrelevant and redundant features. This is due to the limitations of the employed objective functions leading to overestimation of the feature significance. On the contrary, hybrid feature selection methods formulated from information theory and nature-inspired metaheuristic algorithms are preferred because of their high computational efficiency, scalability in avoiding redundant and less informative features, and independence from the classifier. However, these methods have three common drawbacks: (1) poor trade-off between exploration and exploitation phase, (2) getting stuck into an optimal local solution, and (3) avoiding irrelevancy and redundancy of selected features. The first and the second drawback is related to metaheuristic algorithm implementation, while the third is concerned with applied information-theoretic paradigms. To address the aforementioned problems, we developed a new hybrid feature selection method, namely, the Iterative Feature Selection using Dynamic Butterfly Optimization Algorithm based Interaction Maximization (IFS-DBOIM) that combines Dynamic Butterfly Optimization Algorithm (DBOA) with a mutual information-based Feature Interaction Maximization (FIM) scheme for selecting the optimal feature subset. There is evidence that DBOA performs better in exploration, exploitation, and avoidance of local optima entrapment, and FIM comparatively scores the maximum relevancy with minimum redundancy of the new features with previously selected ones. The performance of the proposed method is compared using twenty publicly available datasets with ten baseline feature selection approaches. The results revealed that IFS-DBOIM outperforms other approaches on most datasets, maximizing the percent classification accuracy with the least number of features. The nonparametric Wilcoxon rank test confirms the statistical significance of these outcomes. Moreover, this method realizes the best trade-off between accuracy and stability.
科研通智能强力驱动
Strongly Powered by AbleSci AI