计算机科学
注释
人工智能
模式(遗传算法)
一般化
机器学习
缩放比例
背景(考古学)
深度学习
数据挖掘
生物
几何学
数学
数学分析
古生物学
作者
Felix Fischer,David S. Fischer,R. S. Mukhin,А. В. Исаев,Evan Biederstedt,Alexandra‐Chloé Villani,Fabian J. Theis
标识
DOI:10.1038/s41467-024-51059-5
摘要
Identifying cellular identities is a key use case in single-cell transcriptomics. While machine learning has been leveraged to automate cell annotation predictions for some time, there has been little progress in scaling neural networks to large data sets and in constructing models that generalize well across diverse tissues. Here, we propose scTab, an automated cell type prediction model specific to tabular data, and train it using a novel data augmentation scheme across a large corpus of single-cell RNA-seq observations (22.2 million cells). In this context, we show that cross-tissue annotation requires nonlinear models and that the performance of scTab scales both in terms of training dataset size and model size. Additionally, we show that the proposed data augmentation schema improves model generalization. In summary, we introduce a de novo cell type prediction model for single-cell RNA-seq data that can be trained across a large-scale collection of curated datasets and demonstrate the benefits of using deep learning methods in this paradigm.
科研通智能强力驱动
Strongly Powered by AbleSci AI