GSM演进的增强数据速率
趋同(经济学)
师(数学)
维数之咒
计算机科学
星团(航天器)
聚类分析
算法
数据挖掘
拓扑(电路)
数学
人工智能
组合数学
经济增长
程序设计语言
经济
算术
作者
Yuan Ping,Huina Li,Bin Hao,Chun Guo,Baocang Wang
标识
DOI:10.1016/j.patcog.2023.110036
摘要
Although k-means and its variants are known for their remarkable efficiency, they suffer from a strong dependence on the prior knowledge of K and the assumption of a circle-like pattern, which can result in the algorithms dividing the input space instead of discovering non-predetermined data patterns. Thus, we propose beyond k-means++ that infers and utilizes explicit clusters by emphasizing local geometrical information for better cluster exploration. To avoid the K dependence, a novel framework of iterative division and aggregation (IDA) over k-means++ is presented. It begins with any K≥1, then increases and reduces K along with the procedure of clusters' division and aggregation, respectively. To break through the circle-like pattern limitation, we introduce a reasonability checking strategy (RCS) for cluster division. Given local geometrical information, RCS achieves arbitrary cluster shape support by rejecting edge patterns with distinguished convergence direction and merging adjacent clusters with pseudo-edge patterns. Furthermore, we design an edge shrinkage strategy (ESS). Taking edge patterns as the cluster prototype, it benefits accuracy by effectively avoiding representability reduction due to irregular distribution. To compensate for the loss of efficiency, a near maximin and random sampling algorithm is suggested for large-scale data with high dimensionality. Experimental results confirm that beyond k-means++ is featured by handling arbitrary cluster shapes with remarkable accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI