注释
计算机科学
正确性
聚类分析
人工智能
非负矩阵分解
监督学习
机器学习
模式识别(心理学)
半监督学习
仿形(计算机编程)
分类器(UML)
数据挖掘
矩阵分解
算法
人工神经网络
操作系统
特征向量
物理
量子力学
作者
Dibyendu Bikash Seal,Vivek Das,Rajat K. De
标识
DOI:10.1007/s10489-022-03440-4
摘要
Single cell RNA sequencing (scRNA-seq) allows global transcriptomic profiling at a cellular resolution, thus, identifying underlying cell types and corresponding lineages. Such cell type identification and annotation rely heavily on models that learn by training themselves on a large amount of individual cells with accurate, annotated labels. Presently, this task of cell-type annotation is done based on inspection of marker genes from each of the statistically significant groups of cells. This is both challenging and time consuming. In this article, we have proposed a semi-supervised cell-type annotation method, called CASSL, based on Non-negative matrix factorization (NMF) coupled with recursive k-means algorithm. A semi-supervised model is capable of learning labels for a large amount of unlabelled data with the help of a limited amount of labelled data. The effectiveness of CASSL has been demonstrated on eight publicly available human and mice scRNA-seq datasets across varied organs and protocols. It has been able to correctly annotate majority of the unlabelled cells with high accuracy. It has also been evaluated for its correctness of clustering solution, robustness across varying percentage of missing labels, and time taken for execution. When compared with state-of-the-art unsupervised and semi-supervised cell-type annotation methods, CASSL has consistently outperformed others across all metrics for most of the datasets. It has also shown competitive results when compared against state-of-the-art supervised methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI