聚类分析
计算机科学
数据挖掘
路径(计算)
相互信息
亲和繁殖
模式识别(心理学)
单连锁聚类
对象(语法)
相关聚类
星团(航天器)
高维数据聚类
度量(数据仓库)
歧管(流体力学)
人工智能
CURE数据聚类算法
工程类
程序设计语言
机械工程
作者
Jun Chen,Xinzhong Zhu,Huawen Liu
标识
DOI:10.1016/j.compbiomed.2022.106184
摘要
Clustering analysis has been widely used in various real-world applications. Due to the simplicity of K-means, it has become the most popular clustering analysis technique in reality. Unfortunately, the performance of K-means heavily relies on initial centers, which should be specified in prior. Besides, it cannot effectively identify manifold clusters. In this paper, we propose a novel clustering algorithm based on representative data objects derived from mutual neighbors to identify different shaped clusters. Specifically, it first obtains mutual neighbors to estimate the density for each data object, and then identifies representative objects with high densities to represent the whole data. Moreover, a concept of path distance, deriving from a minimum spanning tree, is introduced to measure the similarities of representative objects for manifold structures. Finally, an improved K-means with initial centers and path-based distances is proposed to group the representative objects into clusters. For non-representative objects, their cluster labels are determined by neighborhood information. To verify the effectiveness of the proposed method, we conducted comparison experiments on synthetic data and further applied it to medical scenarios. The results show that our clustering method can effectively identify arbitrary-shaped clusters and disease types in comparing to the state-of-the-art clustering ones.
科研通智能强力驱动
Strongly Powered by AbleSci AI