聚类分析
数据挖掘
同质性(统计学)
集合(抽象数据类型)
计算机科学
单连锁聚类
星团(航天器)
理论(学习稳定性)
度量(数据仓库)
确定数据集中的群集数
数学
背景(考古学)
层次聚类
相关聚类
统计
CURE数据聚类算法
机器学习
地理
考古
程序设计语言
作者
Serhat Emre Akhanlı,Christian Hennig
标识
DOI:10.1007/s11222-020-09958-2
摘要
A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering. Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability. Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature. Users can choose the indexes that are relevant in the application at hand. In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation. Calibration is relative to a set of random clusterings on the same data. Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.
科研通智能强力驱动
Strongly Powered by AbleSci AI