聚类分析
计算机科学
数据挖掘
CURE数据聚类算法
模糊聚类
数据库扫描
相关聚类
分拆(数论)
树冠聚类算法
数据流聚类
共识聚类
高维数据聚类
人工智能
机器学习
数学
组合数学
作者
Vunnava Dinesh Babu,K. Malathi
标识
DOI:10.11591/ijeecs.v29.i2.pp838-844
摘要
<span lang="EN-US">Large datasets have become useful in data mining for processing, storing, and handling vast amounts of data. However, handling and processing large datasets is time-consuming and memory intensive. As a result, the researchers adopted a partitioning strategy to improve controllability and performance and reduce the time and memory required to handle large datasets. Unfortunately, the numerous clustering techniques available in the literature could confuse experts in choosing the best techniques for a given dataset. Furthermore, no clustering technique can tackle all problems, such as cluster structure, noise, or density. To manage large datasets, existing clustering techniques need scalable solutions. Therefore, this paper proposes an ensemble partition-based clustering with a majority voting technique for large dataset partitioning using the aggregation of k-means, k-medoids, fuzzy c-means, expectation-maximization (EM) and density-based spatial clustering of applications with noise (DBSCAN) techniques. These techniques cluster the large dataset individually in the first stage. The final clusters are discovered in the next stage through a majority voting technique among the five clustering algorithms. These five clustering algorithms assigned data instances to the cluster with the most votes. The experimental findings demonstrate that the ensemble partition-based clustering method surpasses the other five clustering algorithms in terms of execution time and accuracy.</span>
科研通智能强力驱动
Strongly Powered by AbleSci AI