数据库扫描
聚类分析
计算机科学
数据挖掘
数据集
集合(抽象数据类型)
大数据
算法
样品(材料)
CURE数据聚类算法
相关聚类
模式识别(心理学)
人工智能
色谱法
化学
程序设计语言
作者
Nooshin Hanafi,Hamid Saadatfar
标识
DOI:10.1016/j.eswa.2022.117501
摘要
Today, data is being generated with a high speed. Managing large volume of data has become a challenge in the current age. Clustering is a method to analyze data that is generated in the Internet. Various approaches have been presented for data clustering until now. Among them, DBSCAN is a most well-known density-based clustering algorithm. This algorithm can detect clusters of different shapes and does not require prior knowledge about the number of clusters. A major part of the DBSCAN run-time is spent to calculate the distance of data from each other to find the neighbors of each sample in the dataset. The time complexity of this algorithm is O(n2); Therefore, it is not suitable for processing big datasets. In this paper, DBSCAN is improved so that it can be applied to big datasets. The proposed method calculates accurately each sample density based on a reduced set of data. This reduced set is called the operational set. This collection is updated periodically. The use of local samples to calculate the density has greatly reduced the computational cost of clustering. The empirical results on various datasets of different sizes and dimensions show that the proposed algorithm increases the clustering speed compared to recent related works while having similar accuracy as the original DBSCAN algorithm.
科研通智能强力驱动
Strongly Powered by AbleSci AI