聚类分析
降维
计算机科学
非负矩阵分解
插补(统计学)
人工智能
数据挖掘
模式识别(心理学)
高维数据聚类
矩阵分解
机器学习
缺少数据
量子力学
物理
特征向量
作者
Yushan Qiu,Yan Chang,Pu Zhao,Quan Zou
摘要
Abstract Motivation Single-cell RNA sequencing (scRNA-seq) technology attracts extensive attention in the biomedical field. It can be used to measure gene expression and analyze the transcriptome at the single-cell level, enabling the identification of cell types based on unsupervised clustering. Data imputation and dimension reduction are conducted before clustering because scRNA-seq has a high ‘dropout’ rate, noise and linear inseparability. However, independence of dimension reduction, imputation and clustering cannot fully characterize the pattern of the scRNA-seq data, resulting in poor clustering performance. Herein, we propose a novel and accurate algorithm, SSNMDI, that utilizes a joint learning approach to simultaneously perform imputation, dimensionality reduction and cell clustering in a non-negative matrix factorization (NMF) framework. In addition, we integrate the cell annotation as prior information, then transform the joint learning into a semi-supervised NMF model. Through experiments on 14 datasets, we demonstrate that SSNMDI has a faster convergence speed, better dimensionality reduction performance and a more accurate cell clustering performance than previous methods, providing an accurate and robust strategy for analyzing scRNA-seq data. Biological analysis are also conducted to validate the biological significance of our method, including pseudotime analysis, gene ontology and survival analysis. We believe that we are among the first to introduce imputation, partial label information, dimension reduction and clustering to the single-cell field. Availability and implementation The source code for SSNMDI is available at https://github.com/yushanqiu/SSNMDI.
科研通智能强力驱动
Strongly Powered by AbleSci AI