熵(时间箭头)
计算机科学
稳健性(进化)
特征选择
相互信息
模式识别(心理学)
人工智能
算法
缩小
局部敏感散列
数据挖掘
散列函数
哈希表
生物化学
化学
物理
计算机安全
量子力学
基因
程序设计语言
作者
Andrea Mariello,Roberto Battiti
标识
DOI:10.1109/tnnls.2018.2830700
摘要
In feature selection, a measure that captures nonlinear relationships between features and class is the mutual information (MI), which is based on how information in the features reduces the uncertainty in the output. In this paper, we propose a new measure that is related to MI, called neighborhood entropy, and a novel filter method based on its minimization in a greedy procedure. Our algorithm integrates sequential forward selection with approximated nearest-neighbors techniques and locality-sensitive hashing. Experiments show that the classification accuracy is usually higher than that of other state-of-the-art algorithms, with the best results obtained with problems that are highly unbalanced and nonlinearly separable. The order by which the features are selected is also better, leading to a higher accuracy for fewer features. The experimental results indicate that our technique can be employed effectively in offline scenarios when one can dedicate more CPU time to achieve superior results and more robustness to noise and to class imbalance.
科研通智能强力驱动
Strongly Powered by AbleSci AI