NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set

计算机科学 数据集 R包 集合(抽象数据类型) 人工智能 程序设计语言
作者
Malika Charrad,Nadia Ghazzali,Véronique Boiteau,Azam Niknafs
出处
期刊:Le Centre pour la Communication Scientifique Directe - HAL - Diderot 被引量:1424
标识
DOI:10.18637/jss.v061.i06
摘要

Clustering is the partitioning of a set of objects into groups (clusters) so that objects within a group are more similar to each others than objects in different groups. Most of the clustering algorithms depend on some assumptions in order to define the subgroups present in a data set. As a consequence, the resulting clustering scheme requires some sort of evaluation as regards its validity. The evaluation procedure has to tackle difficult problems such as the quality of clusters, the degree with which a clustering scheme fits a specific data set and the optimal number of clusters in a partitioning. In the literature, a wide variety of indices have been proposed to find the optimal number of clusters in a partitioning of a data set during the clustering process. However, for most of indices proposed in the literature, programs are unavailable to test these indices and compare them. The R package NbClust has been developed for that purpose. It provides 30 indices which determine the number of clusters in a data set and it offers also the best clustering scheme from different results to the user. In addition, it provides a function to perform k-means and hierarchical clustering with different distance measures and aggregation methods. Any combination of validation indices and clustering methods can be requested in a single function call. This enables the user to simultaneously evaluate several clustering schemes while varying the number of clusters, to help determining the most appropriate number of clusters for the data set of interest.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
njzhangyanyang完成签到,获得积分0
刚刚
123123完成签到,获得积分10
1秒前
拼搏的似狮完成签到,获得积分10
1秒前
清都完成签到,获得积分10
1秒前
汪汪汪完成签到,获得积分10
1秒前
beleve完成签到,获得积分10
2秒前
oyly完成签到 ,获得积分10
2秒前
YU完成签到,获得积分10
2秒前
hanshishengye完成签到 ,获得积分10
3秒前
LSY完成签到,获得积分10
3秒前
liujie完成签到,获得积分10
4秒前
4秒前
arniu2008发布了新的文献求助10
5秒前
hbsand完成签到,获得积分10
5秒前
鹿阿布完成签到,获得积分10
5秒前
gk完成签到,获得积分0
5秒前
qyzhu完成签到,获得积分10
6秒前
义气发卡完成签到 ,获得积分10
6秒前
laoleigang完成签到,获得积分10
6秒前
自觉沛文完成签到,获得积分10
7秒前
tsai完成签到,获得积分10
7秒前
傲娇的天真完成签到,获得积分10
8秒前
fifteen应助隐形的语海采纳,获得10
8秒前
小灰灰完成签到 ,获得积分10
9秒前
arniu2008发布了新的文献求助10
11秒前
快乐成风完成签到,获得积分10
11秒前
CDI和LIB完成签到,获得积分10
11秒前
cobo完成签到,获得积分10
12秒前
称心的不言完成签到,获得积分10
13秒前
betterme完成签到,获得积分10
14秒前
Leo完成签到,获得积分10
15秒前
Wucaihong完成签到 ,获得积分10
15秒前
D调的华丽完成签到,获得积分10
15秒前
zylt50完成签到,获得积分10
16秒前
mxd1991完成签到,获得积分10
17秒前
17秒前
不能当饭吃完成签到,获得积分10
18秒前
Frank完成签到 ,获得积分10
18秒前
大气的苠完成签到,获得积分10
18秒前
zt完成签到,获得积分10
19秒前
高分求助中
Malcolm Fraser : a biography 680
Signals, Systems, and Signal Processing 610
天津市智库成果选编 600
Climate change and sports: Statistics report on climate change and sports 500
Forced degradation and stability indicating LC method for Letrozole: A stress testing guide 500
Organic Reactions Volume 118 400
A Foreign Missionary on the Long March: The Unpublished Memoirs of Arnolis Hayman of the China Inland Mission 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6459319
求助须知:如何正确求助?哪些是违规求助? 8268445
关于积分的说明 17622079
捐赠科研通 5528578
什么是DOI,文献DOI怎么找? 2905911
邀请新用户注册赠送积分活动 1882638
关于科研通互助平台的介绍 1727808