数据库扫描
聚类分析
计算机科学
比例(比率)
块(置换群论)
数据挖掘
模式识别(心理学)
人工智能
数学
模糊聚类
地理
地图学
CURE数据聚类算法
几何学
作者
Yewang Chen,Lida Zhou,Songwen Pei,Zhiwen Yu,Yi Chen,Xin Liu,Ji‐Xiang Du,Naixue Xiong
标识
DOI:10.1109/tsmc.2019.2956527
摘要
Large-scale data clustering is an essential key for big data problem. However, no current existing approach is "optimal" for big data due to high complexity, which remains it a great challenge. In this article, a simple but fast approximate DBSCAN, namely, KNN-BLOCK DBSCAN, is proposed based on two findings: 1) the problem of identifying whether a point is a core point or not is, in fact, a kNN problem and 2) a point has a similar density distribution to its neighbors, and neighbor points are highly possible to be the same type (core point, border point, or noise). KNN-BLOCK DBSCAN uses a fast approximate kNN algorithm, namely, FLANN, to detect core-blocks (CBs), noncore-blocks, and noise-blocks within which all points have the same type, then a fast algorithm for merging CBs and assigning noncore points to proper clusters is also invented to speedup the clustering process. The experimental results show that KNN-BLOCK DBSCAN is an effective approximate DBSCAN algorithm with high accuracy, and outperforms other current variants of DBSCAN, including ρ-approximate DBSCAN and AnyDBC.
科研通智能强力驱动
Strongly Powered by AbleSci AI