自编码
计算机科学
聚类分析
可扩展性
辍学(神经网络)
人工智能
维数之咒
模式识别(心理学)
数据挖掘
高维数据聚类
人工神经网络
机器学习
数据库
作者
Bang Tran,Duc Tran,Hung Son Nguyen,Seungil Ro,Tin Nguyen
标识
DOI:10.1038/s41598-022-14218-6
摘要
Abstract Unsupervised clustering of single-cell RNA sequencing data (scRNA-seq) is important because it allows us to identify putative cell types. However, the large number of cells (up to millions), the high-dimensionality of the data (tens of thousands of genes), and the high dropout rates all present substantial challenges in single-cell analysis. Here we introduce a new method, named single-cell Clustering using Autoencoder and Network fusion (scCAN), that can overcome these challenges to accurately segregate different cell types in large and sparse scRNA-seq data. In an extensive analysis using 28 real scRNA-seq datasets (more than three million cells) and 243 simulated datasets, we validate that scCAN: (1) correctly estimates the number of true cell types, (2) accurately segregates cells of different types, (3) is robust against dropouts, and (4) is fast and memory efficient. We also compare scCAN with CIDR, SEURAT3, Monocle3, SHARP, and SCANPY. scCAN outperforms these state-of-the-art methods in terms of both accuracy and scalability. The scCAN package is available at https://cran.r-project.org/package=scCAN . Data and R scripts are available at http://sccan.tinnguyen-lab.com/
科研通智能强力驱动
Strongly Powered by AbleSci AI