聚类分析
范畴变量
计算机科学
数据挖掘
层次聚类
贝叶斯概率
特征(语言学)
高维数据聚类
人工智能
共识聚类
机器学习
模式识别(心理学)
相关聚类
CURE数据聚类算法
语言学
哲学
作者
Han Yan,Jiexing Wu,Yang Li,Jun S. Liu
摘要
Bi-clustering is a useful approach in analyzing large biological data sets when the observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in moderate to high dimensions and propose three Bayesian bi-clustering models on categorical data which increase in complexities in their modeling of the distributions of features across bi-clusters. Our proposed methods apply to a wide range of scenarios: from situations where data are cluster-distinguishable only among a small subset of features but masked by a large amount of noise to situations where different groups of data are identified by different sets of features or data exhibit hierarchical structures. Through simulation studies we show that our methods outperform existing (bi-)clustering methods in both identifying clusters and recovering feature distributional patterns across bi-clusters. We further apply the developed approaches to a human genetic dataset, a human single-cell genomic dataset, and a collection of 1774 mouse genomic datasets with a focus on 58 genes from two pathways.
科研通智能强力驱动
Strongly Powered by AbleSci AI