A fuzzy C-means algorithm for optimizing data clustering

计算机科学聚类分析树冠聚类算法 CURE数据聚类算法模糊聚类数据挖掘初始化大数据相关聚类正确性数据流聚类人工智能算法程序设计语言

作者

Seyed Emadedin Hashemi,Fatemeh Gholian-Jouybari,Mostafa Hajiaghaei–Keshteli

出处

期刊：Expert Systems With Applications [Elsevier BV]
日期：2023-05-05 卷期号：227: 120377-120377 被引量：78

标识

DOI：10.1016/j.eswa.2023.120377

摘要

Big data has increasingly become predominant in many research fields affecting human knowledge, including medicine and engineering. Cluster analysis, or clustering, is widely recognized as one of the most effective processes to deal with various types of data, especially big data. There has been considerable interest in Fuzzy C-Means (FCM) as a method for clustering data using a short-distance approach in data mining. However, despite its simplicity, this method is not suitable for clustering large data sets due to their complex structure. In particular, FCM is sensitive to cluster center initialization, and an improper initialization can result in slow or non-optimal convergence. In order to solve the FCM convergence problem and find more appropriate cluster centers, optimization methods are generally used. In this study, a whale optimization algorithm is applied to solve the problem. As a solution to the problem of big data clustering, random sampling, clustering on samples, and extending the clustering results to all data are proposed. The proposed algorithm is implemented on several large data sets, both artificial and real, with many features after normalization and standardization. To verify the validity and correctness of the performance of the proposed algorithm, the same data sets have been clustered with other known algorithms, and the results compared using several valid fuzzy indices. Based on the comparison results, it can be concluded that the proposed algorithm is more powerful and efficient than other algorithms and, hence, can be used to effectively cluster large data sets. Our study can benefit organizations and managers who have a large amount of data and are unable to classify or make use of them properly. Using big data takes a lot of time. The features of the proposed algorithm would be of great help to managers allowing them to make better decisions and improve the quality of their work.

求助该文献

最长约 10秒，即可获得该文献文件

A fuzzy C-means algorithm for optimizing data clustering

今日热心研友