A hybrid feature selection approach based on information theory and dynamic butterfly optimization algorithm for data classification

特征选择计算机科学元启发式数据挖掘相互信息最大化分类器（UML）特征（语言学）人工智能冗余（工程）机器学习算法模式识别（心理学）数学优化数学语言学哲学操作系统

作者

Anurag Tiwari,Amrita Chaturvedi

出处

期刊：Expert Systems With Applications [Elsevier BV]
日期：2022-06-01 卷期号：196: 116621-116621 被引量：34

标识

DOI：10.1016/j.eswa.2022.116621

摘要

The ubiquitous usage of feature selection in search space optimization, information retrieval, data mining, signal processing, software fault prediction, and bioinformatics is paramount to expert and intelligent systems. Most of the conventional feature selection methods implemented are based on filter and wrapper approaches that suffer from poor classification accuracy, high computational cost, and selection of irrelevant and redundant features. This is due to the limitations of the employed objective functions leading to overestimation of the feature significance. On the contrary, hybrid feature selection methods formulated from information theory and nature-inspired metaheuristic algorithms are preferred because of their high computational efficiency, scalability in avoiding redundant and less informative features, and independence from the classifier. However, these methods have three common drawbacks: (1) poor trade-off between exploration and exploitation phase, (2) getting stuck into an optimal local solution, and (3) avoiding irrelevancy and redundancy of selected features. The first and the second drawback is related to metaheuristic algorithm implementation, while the third is concerned with applied information-theoretic paradigms. To address the aforementioned problems, we developed a new hybrid feature selection method, namely, the Iterative Feature Selection using Dynamic Butterfly Optimization Algorithm based Interaction Maximization (IFS-DBOIM) that combines Dynamic Butterfly Optimization Algorithm (DBOA) with a mutual information-based Feature Interaction Maximization (FIM) scheme for selecting the optimal feature subset. There is evidence that DBOA performs better in exploration, exploitation, and avoidance of local optima entrapment, and FIM comparatively scores the maximum relevancy with minimum redundancy of the new features with previously selected ones. The performance of the proposed method is compared using twenty publicly available datasets with ten baseline feature selection approaches. The results revealed that IFS-DBOIM outperforms other approaches on most datasets, maximizing the percent classification accuracy with the least number of features. The nonparametric Wilcoxon rank test confirms the statistical significance of these outcomes. Moreover, this method realizes the best trade-off between accuracy and stability.

求助该文献

最长约 10秒，即可获得该文献文件

A hybrid feature selection approach based on information theory and dynamic butterfly optimization algorithm for data classification

今日热心研友