聚类分析
比例(比率)
CURE数据聚类算法
计算机科学
采样(信号处理)
相关聚类
树冠聚类算法
过程(计算)
算法
星团(航天器)
数据流聚类
数据挖掘
人工智能
计算机视觉
物理
操作系统
滤波器(信号处理)
程序设计语言
量子力学
作者
Shifei Ding,Chao Li,Xia Xu,Ling Ding,Jian Zhang,Lili Guo,Tianhao Shi
标识
DOI:10.1016/j.patcog.2022.109238
摘要
With the rapid development of information technology, massive amount of data is generated. How to discover useful information to support decision-making has become one of the focuses of scholar's research. Clustering is thought to be one of the main means to deal with large-scale data. Density peaks clustering (DPC) is an effective density-based clustering algorithm which is widely applied in numerous fields because of its satisfactory performance. However, the computational complexity of DPC is O(N2) which is not friendly to large-scale data. To solve this issue, a sampling-based density peaks clustering algorithm for large-scale data (SDPC) is proposed. Firstly, a sampling method is used to reduce the distance calculations. Secondly, approximate representatives are identified by an improved TI search strategy which further accelerates the clustering process. Afterwards, the approximate representatives are clustered by DPC. Finally, the remaining points are allocated to the same cluster as its nearest representatives. Experimental results on both synthetic datasets and real-world datasets illustrate that SDPC is more efficient than DPC, while its clustering performance maintains the same level as DPC.
科研通智能强力驱动
Strongly Powered by AbleSci AI