计算机科学
k均值聚类
聚类分析
初始化
离群值
数据挖掘
算法
人工智能
程序设计语言
作者
Mohamed B. Abubaker,Wesam M. Ashour
出处
期刊:International journal of intelligent systems and applications
[MECS Publisher]
日期:2013-02-03
卷期号:5 (3): 37-49
被引量:27
标识
DOI:10.5815/ijisa.2013.03.04
摘要
This paper presents a new approach to overcome one of the most known disadvantages of the well-known Kmeans clustering algorith m.The problems of classical Kmeans are such as the problem of random init ialization of prototypes and the requirement of predefined number of clusters in the dataset.Randomly in itialized prototypes can often yield results to converge to local rather than global optimu m.A better result of Kmeans may be obtained by running it many times to get satisfactory results.The proposed algorith ms are based on a new novel definition of densities of data points which is based on the k-nearest neighbor method.By this definit ion we detect noise and outliers which affect Kmeans strongly, and obtained good initial prototypes from one run with automatic determination of K nu mber of clusters.This algorithm is referred to as Efficient In itializat ion of Kmeans (EI-Kmeans).Still Kmeans algorithm used to cluster data with convex shapes, similar sizes, and densities.Thus we develop a new clustering algorith m called Efficient Data Clustering Algorith m (EDCA) that uses our new definit ion of densities of data points.The results show that the proposed algorithms improve the data clustering by Kmeans.EDCA is able to detect clusters with different non-convex shapes, different sizes and densities.
科研通智能强力驱动
Strongly Powered by AbleSci AI