聚类分析
层次聚类
数据挖掘
随机森林
标杆管理
鉴定(生物学)
计算机科学
机器学习
生物
植物
业务
营销
作者
Cathrine Petersen,Lennart Mucke,M. Ryan Corces
标识
DOI:10.1101/2024.01.18.576317
摘要
Abstract Clustering is a critical step in the analysis of single-cell data, as it enables the discovery and characterization of putative cell types and states. However, most popular clustering tools do not subject clustering results to statistical inference testing, leading to risks of overclustering or underclustering data and often resulting in ineffective identification of cell types with widely differing prevalence. To address these challenges, we present CHOIR ( c lustering h ierarchy o ptimization by iterative random forests), which applies a framework of random forest classifiers and permutation tests across a hierarchical clustering tree to statistically determine which clusters represent distinct populations. We demonstrate the enhanced performance of CHOIR through extensive benchmarking against 14 existing clustering methods across 100 simulated and 4 real single-cell RNA-seq, ATAC-seq, spatial transcriptomic, and multi-omic datasets. CHOIR can be applied to any single-cell data type and provides a flexible, scalable, and robust solution to the important challenge of identifying biologically relevant cell groupings within heterogeneous single-cell data.
科研通智能强力驱动
Strongly Powered by AbleSci AI