计算机科学                        
                
                                
                        
                            分离(微生物学)                        
                
                                
                        
                            构造(python库)                        
                
                                
                        
                            利用                        
                
                                
                        
                            数据挖掘                        
                
                                
                        
                            集合(抽象数据类型)                        
                
                                
                        
                            时间复杂性                        
                
                                
                        
                            异常检测                        
                
                                
                        
                            算法                        
                
                                
                        
                            采样(信号处理)                        
                
                                
                        
                            人工智能                        
                
                                
                        
                            模式识别(心理学)                        
                
                                
                        
                            机器学习                        
                
                                
                        
                            滤波器(信号处理)                        
                
                                
                        
                            微生物学                        
                
                                
                        
                            生物                        
                
                                
                        
                            计算机安全                        
                
                                
                        
                            程序设计语言                        
                
                                
                        
                            计算机视觉                        
                
                        
                    
            作者
            
                Fei Tony Liu,Kai Ming Ting,Zhi‐Hua Zhou            
         
            
    
            
        
                
            摘要
            
            Most existing model-based approaches to anomaly detection construct a profile of normal instances, then identify instances that do not conform to the normal profile as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of profiles normal points. To our best knowledge, the concept of isolation has not been explored in current literature. The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement. Our empirical evaluation shows that iForest performs favourably to ORCA, a near-linear time complexity distance-based method, LOF and random forests in terms of AUC and processing time, and especially in large data sets. iForest also works well in high dimensional problems which have a large number of irrelevant attributes, and in situations where training set does not contain any anomalies.
         
            
 
                 
                
                    
                    科研通智能强力驱动
Strongly Powered by AbleSci AI