数据库扫描
计算机科学
聚类分析
k-最近邻算法
模式识别(心理学)
最近邻链算法
人工智能
图形
算法
最近邻搜索
可扩展性
数据挖掘
CURE数据聚类算法
相关聚类
理论计算机科学
树冠聚类算法
数据库
作者
Avory C. Bryant,Krzysztof J. Cios
标识
DOI:10.1109/tkde.2017.2787640
摘要
A new density-based clustering algorithm, RNN-DBSCAN, is presented which uses reverse nearest neighbor counts as an estimate of observation density. Clustering is performed using a DBSCAN-like approach based on k nearest neighbor graph traversals through dense observations. RNN-DBSCAN is preferable to the popular density-based clustering algorithm DBSCAN in two aspects. First, problem complexity is reduced to the use of a single parameter (choice of k nearest neighbors), and second, an improved ability for handling large variations in cluster density (heterogeneous density). The superiority of RNN-DBSCAN is demonstrated on several artificial and real-world datasets with respect to prior work on reverse nearest neighbor based clustering approaches (RECORD, IS-DBSCAN, and ISB-DBSCAN) along with DBSCAN and OPTICS. Each of these clustering approaches is described by a common graph-based interpretation wherein clusters of dense observations are defined as connected components, along with a discussion on their computational complexity. Heuristics for RNN-DBSCAN parameter selection are presented, and the effects of k on RNN-DBSCAN clusterings discussed. Additionally, with respect to scalability, an approximate version of RNN-DBSCAN is presented leveraging an existing approximate k nearest neighbor technique.
科研通智能强力驱动
Strongly Powered by AbleSci AI