子空间拓扑
离群值
计算机科学
异常检测
线性子空间
时间复杂性
散列函数
算法
集合(抽象数据类型)
数据挖掘
模式识别(心理学)
人工智能
数学
几何学
计算机安全
程序设计语言
作者
Saket Sathe,Charų C. Aggarwal
标识
DOI:10.1109/icdm.2016.0057
摘要
Outlier detection algorithms are often computationally intensive because of their need to score each point in the data. Even simple distance-based algorithms have quadratic complexity. High-dimensional outlier detection algorithms such as subspace methods are often even more computationally intensive because of their need to explore different subspaces of the data. In this paper, we propose an exceedingly simple subspace outlier detection algorithm, which can be implemented in a few lines of code, and whose complexity is linear in the size of the data set and the space requirement is constant. We show that this outlier detection algorithm is much faster than both conventional and high-dimensional algorithms and also provides more accurate results. The approach uses randomized hashing to score data points and has a neat subspace interpretation. Furthermore, the approach can be easily generalized to data streams. We present experimental results showing the effectiveness of the approach over other state-of-the-art methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI