Harnessing the Foundation Model for Exploration of Single-cell Expression Atlases in Plants

注释 计算机科学 推论 聚类分析 源代码 数据集成 计算生物学 人工智能 数据挖掘 机器学习 生物 操作系统
作者
Guangshuo Cao,Haoyu Chao,Wenjun Zheng,Yangming Lan,Kaiyan Lu,Yueyi Wang,Ming Chen,He Zhang,Dijun Chen
出处
期刊:Genomics, Proteomics & Bioinformatics [Elsevier]
被引量:1
标识
DOI:10.1093/gpbjnl/qzaf024
摘要

Abstract Single-cell RNA sequencing (scRNA-seq) provides unprecedented insights into plant cellular diversity by enabling high-resolution analyses of gene expression at the single-cell level. However, the complexity of scRNA-seq data, including challenges in batch integration, cell type annotation, and gene regulatory network (GRN) inference, demands advanced computational approaches. To address these challenges, we developed scPlantLLM, a Transformer model trained on millions of plant single-cell data points. Using a sequential pretraining strategy incorporating masked language modeling and cell type annotation tasks, scPlantLLM generates robust and interpretable single-cell data embeddings. When applied to Arabidopsis thaliana datasets, scPlantLLM excels in clustering, cell type annotation, and batch integration, achieving an accuracy of up to 0.91 in zero-shot learning scenarios. Furthermore, the model demonstrates an ability to identify biologically meaningful GRNs and subtle cellular subtypes, showcasing its potential to advance plant biology research. Compared to traditional methods, scPlantLLM outperforms in key metrics such as adjusted rand index (ARI), normalized mutual information (NMI) and silhouette score (SIL), highlighting its superior clustering accuracy and biological relevance. scPlantLLM represents a foundational model for exploring plant single-cell expression atlases, offering unprecedented capabilities to resolve cellular heterogeneity and regulatory dynamics across diverse plant systems. The code used in this study is available at https://github.com/compbioNJU/scPlantLLM.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
传奇3应助小小蚂蚁采纳,获得10
刚刚
爱学习完成签到,获得积分10
刚刚
刚刚
包容惜雪发布了新的文献求助10
1秒前
SciGPT应助笨笨米卡采纳,获得10
1秒前
Owen应助AAAA采纳,获得10
1秒前
1秒前
2秒前
3秒前
量子星尘发布了新的文献求助10
3秒前
Ying莹完成签到 ,获得积分10
4秒前
GGBond发布了新的文献求助10
4秒前
李1应助科研通管家采纳,获得10
4秒前
乐乐应助科研通管家采纳,获得10
4秒前
深情安青应助王咕咕采纳,获得50
4秒前
4秒前
赘婿应助科研通管家采纳,获得10
4秒前
5秒前
传奇3应助科研通管家采纳,获得10
5秒前
顺心发布了新的文献求助10
5秒前
5秒前
猪肉完成签到,获得积分10
5秒前
嘻嘻发布了新的文献求助10
5秒前
5秒前
kaka发布了新的文献求助10
5秒前
5秒前
5秒前
万能图书馆应助Fyt00采纳,获得10
5秒前
5秒前
6秒前
annie发布了新的文献求助10
6秒前
慕青应助科研通管家采纳,获得10
6秒前
6秒前
Lucas应助科研通管家采纳,获得10
6秒前
6秒前
思源应助科研通管家采纳,获得10
6秒前
6秒前
赘婿应助科研通管家采纳,获得10
6秒前
Lucas应助科研通管家采纳,获得10
7秒前
隐形曼青应助科研通管家采纳,获得10
7秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Aerospace Standards Index - 2026 ASIN2026 3000
Relation between chemical structure and local anesthetic action: tertiary alkylamine derivatives of diphenylhydantoin 1000
Signals, Systems, and Signal Processing 610
Discrete-Time Signals and Systems 610
Principles of town planning : translating concepts to applications 500
Work Engagement and Employee Well-being 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 纳米技术 有机化学 物理 生物化学 化学工程 计算机科学 复合材料 内科学 催化作用 光电子学 物理化学 电极 冶金 遗传学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 6067720
求助须知:如何正确求助?哪些是违规求助? 7899730
关于积分的说明 16328018
捐赠科研通 5209496
什么是DOI,文献DOI怎么找? 2786534
邀请新用户注册赠送积分活动 1769435
关于科研通互助平台的介绍 1647870