计算机科学
特征选择
可扩展性
数据挖掘
特征(语言学)
启发式
算法
模式识别(心理学)
人工智能
语言学
哲学
数据库
作者
Chuan Luo,Sizhao Wang,Tianrui Li,Hongmei Chen,Jiancheng Lv,Lei Zhang
出处
期刊:IEEE transactions on neural networks and learning systems
[Institute of Electrical and Electronics Engineers]
日期:2022-05-12
卷期号:: 1-15
被引量:3
标识
DOI:10.1109/tnnls.2022.3171614
摘要
The selection of prominent features for building more compact and efficient models is an important data preprocessing task in the field of data mining. The rough hypercuboid approach is an emerging technique that can be applied to eliminate irrelevant and redundant features, especially for the inexactness problem in approximate numerical classification. By integrating the meta-heuristic-based evolutionary search technique, a novel global search method for numerical feature selection is proposed in this article based on the hybridization of the rough hypercuboid approach and binary particle swarm optimization (BPSO) algorithm, namely RH-BPSO. To further alleviate the issue of high computational cost when processing large-scale datasets, parallelization approaches for calculating the hybrid feature evaluation criteria are presented by decomposing and recombining hypercuboid equivalence partition matrix via horizontal data partitioning. A distributed meta-heuristic optimized rough hypercuboid feature selection (DiRH-BPSO) algorithm is thus developed and embedded in the Apache Spark cloud computing model. Extensive experimental results indicate that RH-BPSO is promising and can significantly outperform the other representative feature selection algorithms in terms of classification accuracy, the cardinality of the selected feature subset, and execution efficiency. Moreover, experiments on distributed-memory multicore clusters show that DiRH-BPSO is significantly faster than its sequential counterpart and is perfectly capable of completing large-scale feature selection tasks that fail on a single node due to memory constraints. Parallel scalability and extensibility analysis also demonstrate that DiRH-BPSO could scale out and extend well with the growth of computational nodes and the volume of data.
科研通智能强力驱动
Strongly Powered by AbleSci AI