聚类分析
特征选择
计算机科学
特征(语言学)
人工智能
集合(抽象数据类型)
相关聚类
选择(遗传算法)
模式识别(心理学)
数据挖掘
共识聚类
CURE数据聚类算法
语言学
哲学
程序设计语言
作者
Kenong Su,Tianwei Yu,Hao Wu
摘要
Cell clustering is one of the most important and commonly performed tasks in single-cell RNA sequencing (scRNA-seq) data analysis. An important step in cell clustering is to select a subset of genes (referred to as 'features'), whose expression patterns will then be used for downstream clustering. A good set of features should include the ones that distinguish different cell types, and the quality of such set could have a significant impact on the clustering accuracy. All existing scRNA-seq clustering tools include a feature selection step relying on some simple unsupervised feature selection methods, mostly based on the statistical moments of gene-wise expression distributions. In this work, we carefully evaluate the impact of feature selection on cell clustering accuracy. In addition, we develop a feature selection algorithm named FEAture SelecTion (FEAST), which provides more representative features. We apply the method on 12 public scRNA-seq datasets and demonstrate that using features selected by FEAST with existing clustering tools significantly improve the clustering accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI