计算机科学
粗集
特征选择
人工智能
模式识别(心理学)
知识表示与推理
代表(政治)
特征(语言学)
选择(遗传算法)
数据挖掘
语言学
哲学
政治
政治学
法学
作者
Shuyin Xia,Xinyu Bai,Guoyin Wang,Yunlong Cheng,Deyu Meng,Xinbo Gao,Yujia Zhai,Elisabeth Giem
标识
DOI:10.1109/tkde.2022.3220200
摘要
This paper presents a strong data-mining method based on a rough set, which can simultaneously realize feature selection, classification, and knowledge representation. Although a rough set, a popular method for feature selection, has good interpretability, it is not sufficiently efficient and accurate to deal with large-scale datasets with high dimensions, which prevents it from being immediately applied to real-world scenarios. To address the efficiency issue of a rough set, we discover the stability of the local redundancy (SLR) of attributes and propose a theorem to prove it rigorously. Based on SLR, only the parts of objects in the boundary region are partitioned when calculating outer significance, which further improves the efficiency of the rough set. With regard to the accuracy issue, we show that overfitting may lead to ineffectiveness of the rough set, especially when processing noise attributes. We then propose relative importance, a robust measurement for an attribute, to alleviate such overfitting issues. In this paper, we propose a novel rough-set framework that significantly improves the efficiency and accuracy of existing rough-set methods. We further develop our rough set framework by proposing a "rough concept tree" for knowledge representation and classification. Experimental results on public benchmark datasets show that our proposed framework achieves higher accuracy than seven state-of-the-art feature-selection methods. All the codes are available at https://github.com/syxiaa/powerroughset .
科研通智能强力驱动
Strongly Powered by AbleSci AI