特征选择
计算机科学
水准点(测量)
特征(语言学)
启发式
公制(单位)
数据挖掘
选择(遗传算法)
机器学习
样本量测定
人工智能
过程(计算)
启发式
样品(材料)
模式识别(心理学)
数学
统计
操作系统
大地测量学
哲学
经济
色谱法
语言学
化学
地理
运营管理
作者
Hyun-Seok Shin,Sejong Oh
标识
DOI:10.1186/s12859-024-06017-9
摘要
High-dimensional datasets with low sample sizes (HDLSS) are pivotal in the fields of biology and bioinformatics. One of core objective of HDLSS is to select most informative features and discarding redundant or irrelevant features. This is particularly crucial in bioinformatics, where accurate feature (gene) selection can lead to breakthroughs in drug development and provide insights into disease diagnostics. Despite its importance, identifying optimal features is still a significant challenge in HDLSS. To address this challenge, we propose an effective feature selection method that combines gradual permutation filtering with a heuristic tribrid search strategy, specifically tailored for HDLSS contexts. The proposed method considers inter-feature interactions and leverages feature rankings during the search process. In addition, a new performance metric for the HDLSS that evaluates both the number and quality of selected features is suggested. Through the comparison of the benchmark dataset with existing methods, the proposed method reduced the average number of selected features from 37.8 to 5.5 and improved the performance of the prediction model, based on the selected features, from 0.855 to 0.927. The proposed method effectively selects a small number of important features and achieves high prediction performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI