A fuzzy C-means algorithm for optimizing data clustering

计算机科学 聚类分析 树冠聚类算法 CURE数据聚类算法 模糊聚类 数据挖掘 初始化 大数据 相关聚类 正确性 数据流聚类 人工智能 算法 程序设计语言
作者
Seyed Emadedin Hashemi,Fatemeh Gholian-Jouybari,Mostafa Hajiaghaei–Keshteli
出处
期刊:Expert Systems With Applications [Elsevier BV]
卷期号:227: 120377-120377 被引量:68
标识
DOI:10.1016/j.eswa.2023.120377
摘要

Big data has increasingly become predominant in many research fields affecting human knowledge, including medicine and engineering. Cluster analysis, or clustering, is widely recognized as one of the most effective processes to deal with various types of data, especially big data. There has been considerable interest in Fuzzy C-Means (FCM) as a method for clustering data using a short-distance approach in data mining. However, despite its simplicity, this method is not suitable for clustering large data sets due to their complex structure. In particular, FCM is sensitive to cluster center initialization, and an improper initialization can result in slow or non-optimal convergence. In order to solve the FCM convergence problem and find more appropriate cluster centers, optimization methods are generally used. In this study, a whale optimization algorithm is applied to solve the problem. As a solution to the problem of big data clustering, random sampling, clustering on samples, and extending the clustering results to all data are proposed. The proposed algorithm is implemented on several large data sets, both artificial and real, with many features after normalization and standardization. To verify the validity and correctness of the performance of the proposed algorithm, the same data sets have been clustered with other known algorithms, and the results compared using several valid fuzzy indices. Based on the comparison results, it can be concluded that the proposed algorithm is more powerful and efficient than other algorithms and, hence, can be used to effectively cluster large data sets. Our study can benefit organizations and managers who have a large amount of data and are unable to classify or make use of them properly. Using big data takes a lot of time. The features of the proposed algorithm would be of great help to managers allowing them to make better decisions and improve the quality of their work.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
satchzhao发布了新的文献求助10
刚刚
1秒前
cdercder应助BulingQAQ采纳,获得10
1秒前
zxy发布了新的文献求助10
3秒前
3秒前
甜甜芾完成签到,获得积分10
3秒前
华仔应助吃饭用大碗采纳,获得10
3秒前
勤奋尔丝完成签到 ,获得积分10
5秒前
ding应助875728314采纳,获得30
7秒前
周周完成签到,获得积分10
7秒前
wh完成签到,获得积分10
9秒前
万能图书馆应助高会和采纳,获得10
9秒前
研友_841e4L完成签到,获得积分10
10秒前
姆问题发布了新的文献求助10
11秒前
zxy完成签到,获得积分10
12秒前
枇杷完成签到,获得积分10
12秒前
浅梦星河完成签到,获得积分10
12秒前
claire完成签到,获得积分10
12秒前
rr关闭了rr文献求助
12秒前
科研_小白完成签到,获得积分10
14秒前
善学以致用应助QinQin采纳,获得10
14秒前
星辰大海应助cx采纳,获得10
15秒前
赶紧大聪明完成签到,获得积分10
17秒前
舒心的芝麻完成签到,获得积分10
17秒前
drift完成签到,获得积分10
17秒前
chenjzhuc应助heli采纳,获得10
18秒前
七喜完成签到,获得积分10
18秒前
qianchimo完成签到 ,获得积分10
20秒前
21秒前
22秒前
潇洒的浩然完成签到,获得积分10
23秒前
泡泡球完成签到,获得积分10
23秒前
25秒前
kxkx完成签到,获得积分10
25秒前
cx发布了新的文献求助10
25秒前
晓敏完成签到 ,获得积分10
26秒前
ZJR发布了新的文献求助10
26秒前
QinQin发布了新的文献求助10
27秒前
shw完成签到,获得积分10
28秒前
happiness完成签到 ,获得积分10
29秒前
高分求助中
Mass producing individuality 600
Разработка метода ускоренного контроля качества электрохромных устройств 500
A Combined Chronic Toxicity and Carcinogenicity Study of ε-Polylysine in the Rat 400
Advances in Underwater Acoustics, Structural Acoustics, and Computational Methodologies 300
The Framed World: Tourism, Tourists and Photography (New Directions in Tourism Analysis) 1st Edition 200
Graphene Quantum Dots (GQDs): Advances in Research and Applications 200
Advanced Introduction to US Civil Liberties 200
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3825251
求助须知:如何正确求助?哪些是违规求助? 3367521
关于积分的说明 10446344
捐赠科研通 3086892
什么是DOI,文献DOI怎么找? 1698353
邀请新用户注册赠送积分活动 816713
科研通“疑难数据库(出版商)”最低求助积分说明 769937