计算机科学
离群值
异常检测
累积分布函数
数据挖掘
经验分布函数
人工智能
模式识别(心理学)
概率密度函数
统计
数学
作者
Zheng Li,Yue Zhao,Xiyang Hu,Nicola Botta,Cezar Ionescu,George H. Chen
标识
DOI:10.1109/tkde.2022.3159580
摘要
Outlier detection refers to the identification of data points that deviate\nfrom a general data distribution. Existing unsupervised approaches often suffer\nfrom high computational cost, complex hyperparameter tuning, and limited\ninterpretability, especially when working with large, high-dimensional\ndatasets. To address these issues, we present a simple yet effective algorithm\ncalled ECOD (Empirical-Cumulative-distribution-based Outlier Detection), which\nis inspired by the fact that outliers are often the "rare events" that appear\nin the tails of a distribution. In a nutshell, ECOD first estimates the\nunderlying distribution of the input data in a nonparametric fashion by\ncomputing the empirical cumulative distribution per dimension of the data. ECOD\nthen uses these empirical distributions to estimate tail probabilities per\ndimension for each data point. Finally, ECOD computes an outlier score of each\ndata point by aggregating estimated tail probabilities across dimensions. Our\ncontributions are as follows: (1) we propose a novel outlier detection method\ncalled ECOD, which is both parameter-free and easy to interpret; (2) we perform\nextensive experiments on 30 benchmark datasets, where we find that ECOD\noutperforms 11 state-of-the-art baselines in terms of accuracy, efficiency, and\nscalability; and (3) we release an easy-to-use and scalable (with distributed\nsupport) Python implementation for accessibility and reproducibility.\n
科研通智能强力驱动
Strongly Powered by AbleSci AI