异常检测
离群值
维数之咒
计算机科学
高维数据聚类
航程(航空)
排名(信息检索)
秩(图论)
数据挖掘
投影(关系代数)
人工智能
随机投影
高维
模式识别(心理学)
算法
机器学习
聚类分析
数学
复合材料
组合数学
材料科学
作者
Huawen Liu,Xuelong Li,Jiuyong Li,Shichao Zhang
标识
DOI:10.1109/tsmc.2017.2718220
摘要
How to tackle high dimensionality of data effectively and efficiently is still a challenging issue in machine learning. Identifying anomalous objects from given data has a broad range of real-world applications. Although many classical outlier detection or ranking algorithms have been witnessed during the past years, the high-dimensional problem, as well as the size of neighborhood, in outlier detection have not yet attracted sufficient attention. The former may trigger the distance concentration problem that the distances of observations in high-dimensional space tend to be indiscernible, whereas the latter requires appropriate values for parameters, making models high complex and more sensitive. To partially circumvent these problems, especially the high dimensionality, we introduce a concept called local projection score (LPS) to represent deviation degree of an observation to its neighbors. The LPS is obtained from the neighborhood information by the technique of low-rank approximation. The observation with high LPS is a promising candidate of outlier in high probability. Based on this notion, we propose an efficient and effective outlier detection algorithm, which is also robust to the parameter k of k nearest neighbors. Extensive evaluation experiments conducted on twelve public real-world data sets with five popular outlier detection algorithms show that the performance of the proposed method is competitive and promising.
科研通智能强力驱动
Strongly Powered by AbleSci AI