scGAD: a new task and end-to-end framework for generalized cell type annotation and discovery

注释 端到端原则 计算机科学 人工智能 任务(项目管理) 类型(生物学) 生物 工程类 系统工程 生态学
作者
Yuyao Zhai,Liang Chen,Minghua Deng
出处
期刊:Briefings in Bioinformatics [Oxford University Press]
卷期号:24 (2) 被引量:6
标识
DOI:10.1093/bib/bbad045
摘要

Abstract The rapid development of single-cell RNA sequencing (scRNA-seq) technology allows us to study gene expression heterogeneity at the cellular level. Cell annotation is the basis for subsequent downstream analysis in single-cell data mining. As more and more well-annotated scRNA-seq reference data become available, many automatic annotation methods have sprung up in order to simplify the cell annotation process on unlabeled target data. However, existing methods rarely explore the fine-grained semantic knowledge of novel cell types absent from the reference data, and they are usually susceptible to batch effects on the classification of seen cell types. Taking into consideration the limitations above, this paper proposes a new and practical task called generalized cell type annotation and discovery for scRNA-seq data whereby target cells are labeled with either seen cell types or cluster labels, instead of a unified ‘unassigned’ label. To accomplish this, we carefully design a comprehensive evaluation benchmark and propose a novel end-to-end algorithmic framework called scGAD. Specifically, scGAD first builds the intrinsic correspondences on seen and novel cell types by retrieving geometrically and semantically mutual nearest neighbors as anchor pairs. Together with the similarity affinity score, a soft anchor-based self-supervised learning module is then designed to transfer the known label information from reference data to target data and aggregate the new semantic knowledge within target data in the prediction space. To enhance the inter-type separation and intra-type compactness, we further propose a confidential prototype self-supervised learning paradigm to implicitly capture the global topological structure of cells in the embedding space. Such a bidirectional dual alignment mechanism between embedding space and prediction space can better handle batch effect and cell type shift. Extensive results on massive simulation datasets and real datasets demonstrate the superiority of scGAD over various state-of-the-art clustering and annotation methods. We also implement marker gene identification to validate the effectiveness of scGAD in clustering novel cell types and their biological significance. To the best of our knowledge, we are the first to introduce this new and practical task and propose an end-to-end algorithmic framework to solve it. Our method scGAD is implemented in Python using the Pytorch machine-learning library, and it is freely available at https://github.com/aimeeyaoyao/scGAD.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
perseverance完成签到,获得积分10
1秒前
愉快彩虹完成签到,获得积分10
2秒前
甘楽完成签到,获得积分10
2秒前
鱼fish完成签到 ,获得积分10
3秒前
搞怪慕晴完成签到,获得积分10
4秒前
mouxq完成签到,获得积分10
5秒前
han完成签到,获得积分10
5秒前
6秒前
6698完成签到,获得积分20
6秒前
科研通AI5应助suan采纳,获得10
7秒前
哒哒李完成签到,获得积分10
8秒前
念姬完成签到 ,获得积分10
9秒前
甘楽发布了新的文献求助10
9秒前
9秒前
你眼带笑完成签到 ,获得积分10
11秒前
xi完成签到,获得积分10
12秒前
充满希望发布了新的文献求助10
13秒前
6698发布了新的文献求助10
16秒前
16秒前
陶醉的元槐完成签到,获得积分10
18秒前
充满希望完成签到,获得积分10
20秒前
杨子墨发布了新的文献求助10
20秒前
21秒前
Jasper应助GX123采纳,获得30
21秒前
WR关闭了WR文献求助
22秒前
zhaohu47完成签到,获得积分10
23秒前
李姗姗发布了新的文献求助50
24秒前
自信的秀发完成签到 ,获得积分10
24秒前
24秒前
lzf发布了新的文献求助10
25秒前
任梦甜完成签到 ,获得积分10
25秒前
李爱国应助Whiaper采纳,获得10
27秒前
时尚战斗机完成签到,获得积分10
27秒前
28秒前
susong987完成签到,获得积分10
28秒前
Xx完成签到,获得积分10
28秒前
无情的匪完成签到 ,获得积分10
30秒前
自然的芷蝶完成签到,获得积分10
30秒前
31秒前
CodeCraft应助xi采纳,获得10
31秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
求中国石油大学(北京)图书馆的硕士论文,作者董晨,十年前搞太赫兹的 500
Vertebrate Palaeontology, 5th Edition 500
Narrative Method and Narrative form in Masaccio's Tribute Money 500
Aircraft Engine Design, Third Edition 500
Neonatal and Pediatric ECMO Simulation Scenarios 500
苏州地下水中新污染物及其转化产物的非靶向筛查 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 4775419
求助须知:如何正确求助?哪些是违规求助? 4107785
关于积分的说明 12706541
捐赠科研通 3828927
什么是DOI,文献DOI怎么找? 2112301
邀请新用户注册赠送积分活动 1136182
关于科研通互助平台的介绍 1019849