K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data

大数据 聚类分析 计算机科学 稳健性(进化) 算法 初始化 数据挖掘 欧几里德距离 先验与后验 机器学习 人工智能 程序设计语言 化学 哲学 认识论 基因 生物化学
作者
Abiodun M. Ikotun,Absalom E. Ezugwu,Laith Abualigah,Belal Abuhaija,Heming Jia
出处
期刊:Information Sciences [Elsevier]
卷期号:622: 178-210 被引量:224
标识
DOI:10.1016/j.ins.2022.11.139
摘要

Advances in recent techniques for scientific data collection in the era of big data allow for the systematic accumulation of large quantities of data at various data-capturing sites. Similarly, exponential growth in the development of different data analysis approaches has been reported in the literature, amongst which the K-means algorithm remains the most popular and straightforward clustering algorithm. The broad applicability of the algorithm in many clustering application areas can be attributed to its implementation simplicity and low computational complexity. However, the K-means algorithm has many challenges that negatively affect its clustering performance. In the algorithm’s initialization process, users must specify the number of clusters in a given dataset apriori while the initial cluster centers are randomly selected. Furthermore, the algorithm's performance is susceptible to the selection of this initial cluster and for large datasets, determining the optimal number of clusters to start with becomes complex and is a very challenging task. Moreover, the random selection of the initial cluster centers sometimes results in minimal local convergence due to its greedy nature. A further limitation is that certain data object features are used in determining their similarity by using the Euclidean distance metric as a similarity measure, but this limits the algorithm’s robustness in detecting other cluster shapes and poses a great challenge in detecting overlapping clusters. Many research efforts have been conducted and reported in literature with regard to improving the K-means algorithm’s performance and robustness. The current work presents an overview and taxonomy of the K-means clustering algorithm and its variants. The history of the K-means, current trends, open issues and challenges, and recommended future research perspectives are also discussed.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
盒子发布了新的文献求助10
1秒前
worrysyx发布了新的文献求助10
1秒前
刘子发布了新的文献求助10
3秒前
ablesic.rong发布了新的文献求助10
4秒前
4秒前
甜美笑柳完成签到,获得积分10
5秒前
paul发布了新的文献求助10
11秒前
shinysparrow应助luchong采纳,获得200
11秒前
12秒前
王玄琳发布了新的文献求助10
13秒前
科研通AI2S应助qer采纳,获得10
15秒前
所所应助刘子采纳,获得10
15秒前
ZX801发布了新的文献求助10
15秒前
Lucas应助chen采纳,获得10
15秒前
17秒前
17秒前
18秒前
18秒前
21秒前
shinysparrow应助huang3749采纳,获得10
23秒前
嘟嘟嘟嘟发布了新的文献求助10
24秒前
24秒前
优等生发布了新的文献求助10
24秒前
奋斗立辉完成签到,获得积分20
25秒前
25秒前
26秒前
27秒前
29秒前
大模型应助单纯的访风采纳,获得10
30秒前
sikaixue发布了新的文献求助10
31秒前
33秒前
Kate发布了新的文献求助10
35秒前
dyd完成签到,获得积分10
36秒前
feng发布了新的文献求助10
38秒前
朱冰蓝完成签到 ,获得积分10
40秒前
41秒前
paul完成签到,获得积分10
42秒前
小二郎应助JOJO采纳,获得10
43秒前
斯文败类应助中肉肉采纳,获得10
44秒前
高分求助中
请在求助之前详细阅读求助说明!!!! 20000
One Man Talking: Selected Essays of Shao Xunmei, 1929–1939 1000
Sphäroguß als Werkstoff für Behälter zur Beförderung, Zwischen- und Endlagerung radioaktiver Stoffe - Untersuchung zu alternativen Eignungsnachweisen: Zusammenfassender Abschlußbericht 1000
Yuwu Song, Biographical Dictionary of the People's Republic of China 700
[Lambert-Eaton syndrome without calcium channel autoantibodies] 520
The Three Stars Each: The Astrolabes and Related Texts 500
Additive Manufacturing Design and Applications 320
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2466453
求助须知:如何正确求助?哪些是违规求助? 2134622
关于积分的说明 5439667
捐赠科研通 1859881
什么是DOI,文献DOI怎么找? 925107
版权声明 562626
科研通“疑难数据库(出版商)”最低求助积分说明 494918