离群值
维数之咒
计算机科学
排名(信息检索)
异常检测
数据集
集合(抽象数据类型)
欧几里德距离
数据挖掘
数据点
差异(会计)
人工智能
高维数据聚类
欧几里得空间
任务(项目管理)
点(几何)
维数(图论)
模式识别(心理学)
数学
聚类分析
几何学
会计
管理
经济
纯数学
业务
程序设计语言
作者
Hans‐Peter Kriegel,Matthias Schubert,Arthur Zimek
标识
DOI:10.1145/1401890.1401946
摘要
Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. All existing approaches, however, are based on an assessment of distances (sometimes indirectly by assuming certain distributions) in the full-dimensional Euclidean data space. In high-dimensional data, these approaches are bound to deteriorate due to the notorious "curse of dimensionality". In this paper, we propose a novel approach named ABOD (Angle-Based Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points. This way, the effects of the "curse of dimensionality" are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method does not rely on any parameter selection influencing the quality of the achieved ranking. In a thorough experimental evaluation, we compare ABOD to the well-established distance-based method LOF for various artificial and a real world data set and show ABOD to perform especially well on high-dimensional data.
科研通智能强力驱动
Strongly Powered by AbleSci AI