星团(航天器)
交叉口(航空)
聚类分析
紧凑空间
索引(排版)
计算机科学
数据挖掘
k-最近邻算法
基础(线性代数)
完整的链接聚类
数学
模式识别(心理学)
人工智能
模糊聚类
地理
地图学
树冠聚类算法
万维网
程序设计语言
纯数学
几何学
作者
Xinjie Duan,Yan Ma,Yuqing Zhou,Hui Huang,Bin Wang
标识
DOI:10.1016/j.eswa.2023.119784
摘要
The true cluster number of the dataset in practical applications is rarely known in advance. Therefore, it is necessary to use a cluster validity index to evaluate the clustering results and determine the optimal cluster number. However, the performance of existing cluster validity indices is vulnerable to various factors such as cluster shape and density. To solve the above issues, this paper proposes a new cluster validity index based on augmented non-shared nearest neighbors (ANCV). The ANCV index is based on the following principles: (1) Within-cluster compactness can be measured by the distance between the pairs of data points with fewer shared nearest neighbors. (2) The distances between the pairs of data points at the intersection of clusters can be used to estimate the between-cluster separation. On this basis, the above point pairs are further extended to their augmented non-shared nearest neighbors, thereby forming small clusters. Then, the average distance within and between these clusters is calculated respectively to estimate the within-cluster compactness and between-cluster separation. Finally, the optimal number of clusters is determined by the difference between the between-cluster separation and the within-cluster compactness. Experimental results on both 12 two-dimensional synthetic datasets and 10 real datasets from UCI have shown that the ANCV index performs the best among all compared indices.
科研通智能强力驱动
Strongly Powered by AbleSci AI