Evolutionary Computation for Feature Selection in Classification

特征选择 适应度函数 人工智能 计算机科学 进化计算 粒子群优化 特征(语言学) 选择(遗传算法) 滤波器(信号处理) 机器学习 模式识别(心理学) 数据挖掘 进化算法 遗传算法 哲学 语言学 计算机视觉
作者
Hoai Nam Nguyen
标识
DOI:10.26686/wgtn.17134145
摘要

<p>Classification aims to identify a class label of an instance according to the information from its characteristics or features. Unfortunately, many classification problems have a large feature set containing irrelevant and redundant features, which reduce the classification performance. In order to address the above problem, feature selection is proposed to select a small subset of relevant features. There are three main types of feature selection methods, i.e. wrapper, embedded and filter approaches. Wrappers use a classification algorithm to evaluate candidate feature subsets. In embedded approaches, the selection process is embedded in the training process of a classification algorithm. Different from the other two approaches, filters do not involve any classification algorithm during the selection process. Feature selection is an important process but it is not an easy task due to its large search space and complex feature interactions. Because of the potential global search ability, Evolutionary Computation (EC), especially Particle Swarm Optimization (PSO), has been widely and successfully applied to feature selection. However, there is potential to improve the effectiveness and efficiency of EC-based feature selection. The overall goal of this thesis is to investigate and improve the capability of EC for feature selection to select small feature subsets while maintaining or even improving the classification performance compared to using all features. Different aspects of feature selection are considered in this thesis such as the number of objectives (single-objective/multi-objective), the fitness function (filter/wrapper), and the searching mechanism. This thesis introduces a new fitness function based on mutual information which is calculated by an estimation approach instead of the traditional counting approach. Results show that the estimation approach works well on both continuous and discrete data. More importantly, mutual information calculated by the estimation approach can capture feature interactions better than the traditional counting approach. This thesis develops a novel binary PSO algorithm, which is the first work to redefine some core concepts of PSO such as velocity and momentum to suit the characteristics of binary search spaces. Experimental results show that the proposed binary PSO algorithm evolve better solutions than other binary EC algorithms when the search spaces are large and complex. Specifically, on feature selection, the proposed binary PSO algorithm can select smaller feature subsets with similar or better classification accuracies, especially when there are a large number of features. This thesis proposes surrogate models for wrapper-based feature selection. The surrogate models use surrogate training sets which are subsets of informative instances selected from the training set. Experimental results show that the proposed surrogate models assist PSO to reduce the computational cost while maintaining or even improving the classification performance compared to using only the original training set. The thesis develops the first wrapper-based multi-objective feature selection algorithm using MOEA/D. A new decomposition strategy using multiple reference points for MOEA/D is designed, which can deal with different characteristics of multi-objective feature selection such as highly discontinuous Pareto fronts and complex relationships between objectives. The experimental results show that the proposed algorithm can evolve more diverse non-dominated sets than other multi-objective algorithms. This thesis introduces the first PSO-based feature selection algorithm for transfer learning. In the proposed algorithm, the fitness function uses classification performance to reduce the differences between domains while maintaining the discriminative ability on the target domain. The experimental results show that the proposed algorithm can select feature subsets which achieve better classification performance than four state-of-the-art feature-based transfer learning algorithms.</p>
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
羚羊完成签到,获得积分10
刚刚
piu发布了新的文献求助10
刚刚
wwl发布了新的文献求助10
1秒前
1秒前
小可啊完成签到,获得积分10
1秒前
从容芮应助研友_Lw4Ngn采纳,获得10
1秒前
2秒前
2秒前
3秒前
酸化土壤改良应助ljw采纳,获得10
4秒前
科研小卡拉米完成签到,获得积分10
6秒前
6秒前
逍遥完成签到,获得积分10
7秒前
7秒前
Hello应助安全平静采纳,获得10
7秒前
seapowerseries发布了新的文献求助200
10秒前
萧羽完成签到,获得积分10
10秒前
12秒前
静静发布了新的文献求助10
12秒前
12秒前
上官幽思发布了新的文献求助10
12秒前
SciGPT应助sorato采纳,获得10
14秒前
fxy发布了新的文献求助10
14秒前
Lucas应助伶俐的以筠采纳,获得10
15秒前
15秒前
艺馨发布了新的文献求助10
15秒前
ding发布了新的文献求助10
15秒前
CodeCraft应助海盐黑胡椒123采纳,获得10
15秒前
CoCoNIE完成签到,获得积分20
16秒前
半夏发布了新的文献求助10
17秒前
17秒前
失似发布了新的文献求助10
17秒前
17秒前
云飞扬完成签到,获得积分10
18秒前
21秒前
21秒前
piu关闭了piu文献求助
21秒前
21秒前
喵了个咪完成签到 ,获得积分10
22秒前
安全平静发布了新的文献求助10
22秒前
高分求助中
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Sport in der Antike 800
De arte gymnastica. The art of gymnastics 600
Berns Ziesemer - Maos deutscher Topagent: Wie China die Bundesrepublik eroberte 500
Stephen R. Mackinnon - Chen Hansheng: China’s Last Romantic Revolutionary (2023) 500
Sport in der Antike Hardcover – March 1, 2015 500
Boris Pesce - Gli impiegati della Fiat dal 1955 al 1999 un percorso nella memoria 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2421474
求助须知:如何正确求助?哪些是违规求助? 2111278
关于积分的说明 5344140
捐赠科研通 1838797
什么是DOI,文献DOI怎么找? 915376
版权声明 561171
科研通“疑难数据库(出版商)”最低求助积分说明 489550