计算机科学
聚类分析
空格(标点符号)
算法
人工智能
数据挖掘
操作系统
作者
Yuqing Yang,Jianghui Cai,Haifeng Yang,Lanjuan Li,Xujun Zhao
标识
DOI:10.1016/j.eswa.2022.117018
摘要
The time overhead is huge and the clustering quality is unstable when running the K-means algorithm on massive raw data. To solve these problems, the concept of the influence space is introduced, and on this basis, a new clustering algorithm named ISBFK-means based on the influence space is proposed in this paper. First, the influence space divides the given data set into multiple small regions. Then, the representative data objects in each region are obtained to form a new data set, in which the class labels of representative data objects are those of all the data objects in the correlation influence space. Next, the K-means clustering is performed on the new data set, thereby obtaining the final clustering result. Theoretical analysis and experimental results show that this approach effectively reduces the amount of data in the clustering process and improves the stability of clustering quality. As a major feature of this work, the celestial spectral data observed by the LAMOST survey are especially employed to verify the algorithm ISBFK-means . The experimental results indicate that this algorithm has higher performance than other similar algorithms on the correctness, efficiency and sensitivity to the quality of spectral data. • The impact of outliers on clustering results is reduced using the influence space. • Representative data objects of the original datasets are extracted under the influence space. • A clustering algorithm called ISBFK-means is proposed based on influence space. • Valuable information hidden in LAMOST low-quality spectra is revealed by ISBFK-means algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI