聚类分析
计算机科学
层次聚类
数据挖掘
共识聚类
统计假设检验
人工智能
相关聚类
CURE数据聚类算法
数学
统计
作者
Isabella N. Grabski,Kelly Street,Rafael A. Irizarry
出处
期刊:Nature Methods
[Springer Nature]
日期:2023-07-10
卷期号:20 (8): 1196-1202
被引量:18
标识
DOI:10.1038/s41592-023-01933-9
摘要
Unsupervised clustering of single-cell RNA-sequencing data enables the identification of distinct cell populations. However, the most widely used clustering algorithms are heuristic and do not formally account for statistical uncertainty. We find that not addressing known sources of variability in a statistically rigorous manner can lead to overconfidence in the discovery of novel cell types. Here we extend a previous method, significance of hierarchical clustering, to propose a model-based hypothesis testing approach that incorporates significance analysis into the clustering algorithm and permits statistical evaluation of clusters as distinct cell populations. We also adapt this approach to permit statistical assessment on the clusters reported by any algorithm. Finally, we extend these approaches to account for batch structure. We benchmarked our approach against popular clustering workflows, demonstrating improved performance. To show practical utility, we applied our approach to the Human Lung Cell Atlas and an atlas of the mouse cerebellar cortex, identifying several cases of over-clustering and recapitulating experimentally validated cell type definitions.
科研通智能强力驱动
Strongly Powered by AbleSci AI