合并(版本控制)
计算机科学
修剪
公制(单位)
偏爱
人工智能
数据挖掘
机器学习
情报检索
数学
统计
工程类
农学
运营管理
生物
作者
Xiangjun Dong,Ping Qiu,Jinhu Lü,Longbing Cao,Tiantian Xu
标识
DOI:10.1109/tnnls.2018.2886199
摘要
As an important tool for behavior informatics, negative sequential patterns (NSPs) (such as missing a medical treatment) are sometimes much more informative than positive sequential patterns (PSPs) (e.g., attending a medical treatment) in many applications. However, NSP mining is at an early stage and faces many challenging problems, including 1) how to mine an expected number of NSPs; 2) how to select useful NSPs; and 3) how to reduce high time consumption. To solve the first problem, we propose an algorithm Topk-NSP to mine the k most frequent negative patterns. In Topk-NSP, we first mine the top-k PSPs using the existing methods, and then we use an idea which is similar to top-k PSPs mining to mine the top-k NSPs from these PSPs. To solve the remaining two problems, we propose three optimization strategies for Topk-NSP. The first optimization strategy is that, in order to consider the influence of PSPs when selecting useful top-k NSPs, we introduce two weights, w P and w N , to express the user preference degree for NSPs and PSPs, respectively, and select useful NSPs by a weighted support wsup. The second optimization strategy is to merge wsup and an interestingness metric to select more useful NSPs. The third optimization strategy is to introduce a pruning strategy to reduce the high computational costs of Topk-NSP. Finally, we propose an optimization algorithm Topk-NSP + . To the best of our knowledge, Topk-NSP + is the first algorithm that can mine the top-k useful NSPs. The experimental results on four synthetic and two real-life data sets show that the Topk-NSP + is very efficient in mining the top-k NSPs in the sense of computational cost and scalability.
科研通智能强力驱动
Strongly Powered by AbleSci AI