Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection

聚类分析 计算机科学 可视化 离群值 异常检测 数据挖掘 等级制度 层次聚类 模式识别(心理学) CURE数据聚类算法 数学 人工智能 相关聚类 市场经济 经济
作者
Ricardo J. G. B. Campello,Davoud Moulavi,Arthur Zimek,Jörg Sander
出处
期刊:ACM Transactions on Knowledge Discovery From Data [Association for Computing Machinery]
卷期号:10 (1): 1-51 被引量:640
标识
DOI:10.1145/2733381
摘要

An integrated framework for density-based cluster analysis, outlier detection, and data visualization is introduced in this article. The main module consists of an algorithm to compute hierarchical estimates of the level sets of a density, following Hartigan’s classic model of density-contour clusters and trees. Such an algorithm generalizes and improves existing density-based clustering techniques with respect to different aspects. It provides as a result a complete clustering hierarchy composed of all possible density-based clusters following the nonparametric model adopted, for an infinite range of density thresholds. The resulting hierarchy can be easily processed so as to provide multiple ways for data visualization and exploration. It can also be further postprocessed so that: (i) a normalized score of “outlierness” can be assigned to each data object, which unifies both the global and local perspectives of outliers into a single definition; and (ii) a “flat” (i.e., nonhierarchical) clustering solution composed of clusters extracted from local cuts through the cluster tree (possibly corresponding to different density thresholds) can be obtained, either in an unsupervised or in a semisupervised way. In the unsupervised scenario, the algorithm corresponding to this postprocessing module provides a global, optimal solution to the formal problem of maximizing the overall stability of the extracted clusters. If partially labeled objects or instance-level constraints are provided by the user, the algorithm can solve the problem by considering both constraints violations/satisfactions and cluster stability criteria. An asymptotic complexity analysis, both in terms of running time and memory space, is described. Experiments are reported that involve a variety of synthetic and real datasets, including comparisons with state-of-the-art, density-based clustering and (global and local) outlier detection methods.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
怡然铃铛发布了新的文献求助10
刚刚
lili完成签到,获得积分10
1秒前
完美冷安完成签到,获得积分10
1秒前
1秒前
VictorySaber完成签到,获得积分10
1秒前
受伤芝麻完成签到,获得积分10
3秒前
4秒前
碎冰蓝发布了新的文献求助10
5秒前
踏实的翠绿完成签到,获得积分10
5秒前
PJR发布了新的文献求助10
5秒前
iNk应助受伤芝麻采纳,获得20
7秒前
飞飞发布了新的文献求助10
9秒前
9秒前
10秒前
12秒前
12秒前
小马甲应助眼睛大的黑猫采纳,获得10
13秒前
001发布了新的文献求助10
14秒前
yuaner发布了新的文献求助10
15秒前
16秒前
17秒前
十二完成签到,获得积分10
17秒前
木木完成签到 ,获得积分10
18秒前
20秒前
小二郎应助秃顶双马尾采纳,获得10
20秒前
wankai发布了新的文献求助10
22秒前
专一的傲白完成签到 ,获得积分10
23秒前
23秒前
lq发布了新的文献求助10
24秒前
天天小女孩完成签到,获得积分10
24秒前
25秒前
JamesPei应助吴陈采纳,获得10
26秒前
nbbyysnbb发布了新的文献求助10
28秒前
29秒前
yc发布了新的文献求助10
29秒前
时雨完成签到 ,获得积分10
31秒前
四夕水窖完成签到,获得积分10
33秒前
34秒前
梁硕发布了新的文献求助10
36秒前
忧郁的太英完成签到,获得积分10
36秒前
高分求助中
Basic Discrete Mathematics 1000
Technologies supporting mass customization of apparel: A pilot project 600
Introduction to Strong Mixing Conditions Volumes 1-3 500
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
China Gadabouts: New Frontiers of Humanitarian Nursing, 1941–51 400
The Healthy Socialist Life in Maoist China, 1949–1980 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3799716
求助须知:如何正确求助?哪些是违规求助? 3345044
关于积分的说明 10323077
捐赠科研通 3061547
什么是DOI,文献DOI怎么找? 1680394
邀请新用户注册赠送积分活动 807069
科研通“疑难数据库(出版商)”最低求助积分说明 763462