K-NNDP: K-means algorithm based on nearest neighbor density peak optimization and outlier removal

离群值 质心 计算机科学 随机性 聚类分析 k-最近邻算法 可扩展性 算法 数据挖掘 异常检测 理论(学习稳定性) 模式识别(心理学) 人工智能 数学 机器学习 数据库 统计
作者
Jiyong Liao,Xingjiao Wu,Yaxin Wu,Juelin Shu
出处
期刊:Knowledge Based Systems [Elsevier BV]
卷期号:294: 111742-111742 被引量:4
标识
DOI:10.1016/j.knosys.2024.111742
摘要

K-means is an unsupervised method for vector quantification derived from signal processing. It is currently used in data mining and knowledge-discovery. The advantages of K-means include its simple operation, scalability, and suitability for processing large-scale datasets. However, K-means randomly selects the initial cluster center, which causes unstable clustering results, and outliers affect algorithm performance. To address this challenge, we propose a nearest-neighbor density peak (NNDP)-optimized initial cluster center and outlier removal algorithm. To solve the problem of randomly selecting the initial cluster center, we propose NNDP-based K-means (K-NNDP). K-NNDP automatically selects the initial cluster centers based on decision values, ensuring stable algorithm operation. In addition, we adopt a local search strategy to eliminate outliers, identify outliers using a set threshold, and use the median instead of the mean in subsequent centroid iterations to reduce the impact of outliers on the algorithm. It is worth mentioning that, to date, most previous studies have addressed the two problems independently, which makes it easy for the algorithm to fall into a local optimal solution. Therefore, we innovatively combine these two problems using K-nearest neighbor modeling. To evaluate the effectiveness of K-NNDP, we conducted comparative experiments on several synthetic and real-world datasets. K-NNDP outperformed two classical algorithms and six state-of-the-art improved K-means algorithms. The results prove that K-NNDP can effectively solve the problems of randomness and outlier influence of K-means, and the effect is significant.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
dennisysz发布了新的文献求助10
1秒前
8秒前
反杀闰土的猹完成签到,获得积分10
8秒前
我爱学习完成签到,获得积分10
11秒前
12秒前
13秒前
赘婿应助waa采纳,获得10
13秒前
98发布了新的文献求助10
17秒前
默默忆山发布了新的文献求助10
26秒前
浅浅完成签到 ,获得积分10
28秒前
香蕉觅云应助TTT采纳,获得10
31秒前
32秒前
35秒前
糕糕完成签到 ,获得积分0
35秒前
35秒前
waa发布了新的文献求助10
37秒前
星夜发布了新的文献求助10
38秒前
39秒前
诚心初晴完成签到,获得积分10
43秒前
ruochenzu完成签到,获得积分10
44秒前
44秒前
45秒前
杨杨完成签到 ,获得积分10
47秒前
xz完成签到,获得积分10
48秒前
49秒前
研友_VZG7GZ应助小周采纳,获得10
50秒前
51秒前
TTT发布了新的文献求助10
52秒前
maodianandme发布了新的文献求助10
52秒前
55秒前
57秒前
jerry完成签到 ,获得积分10
58秒前
ruochenzu发布了新的文献求助10
59秒前
1分钟前
1分钟前
东溟渔夫发布了新的文献求助10
1分钟前
1分钟前
dennisysz发布了新的文献求助10
1分钟前
科研通AI5应助LHL采纳,获得10
1分钟前
东溟渔夫完成签到,获得积分10
1分钟前
高分求助中
【此为提示信息,请勿应助】请按要求发布求助,避免被关 20000
ISCN 2024 – An International System for Human Cytogenomic Nomenclature (2024) 3000
Continuum Thermodynamics and Material Modelling 2000
Encyclopedia of Geology (2nd Edition) 2000
105th Edition CRC Handbook of Chemistry and Physics 1600
Maneuvering of a Damaged Navy Combatant 650
the MD Anderson Surgical Oncology Manual, Seventh Edition 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3777469
求助须知:如何正确求助?哪些是违规求助? 3322775
关于积分的说明 10211743
捐赠科研通 3038195
什么是DOI,文献DOI怎么找? 1667163
邀请新用户注册赠送积分活动 797990
科研通“疑难数据库(出版商)”最低求助积分说明 758133