计算机科学
元数据
语言模型
嵌入
原始数据
聚类分析
注释
功能(生物学)
发电机(电路理论)
领域(数学分析)
人工智能
自然语言处理
数据挖掘
机器学习
程序设计语言
万维网
功率(物理)
物理
数学分析
生物
进化生物学
量子力学
数学
作者
Tian-Yu Liu,Tianqi Chen,Wangjie Zheng,Xiao Luo,Hongyu Zhao
标识
DOI:10.1101/2023.12.07.569910
摘要
Abstract Various Foundation Models (FMs) have been built based on the pre-training and fine-tuning framework to analyze single-cell data with different degrees of success. In this manuscript, we propose a method named scELMo (Single-cell Embedding from Language Models), to analyze single-cell data that utilizes Large Language Models (LLMs) as a generator for both the description of metadata information and the embeddings for such descriptions. We combine the embeddings from LLMs with the raw data under the zero-shot learning framework to further extend its function by using the fine-tuning framework to handle different tasks. We demonstrate that scELMo is capable of cell clustering, batch effect correction, and cell-type annotation without training a new model. Moreover, the fine-tuning framework of scELMo can help with more challenging tasks including in-silico treatment analysis or modeling perturbation. scELMo has a lighter structure and lower requirements for resources. Our method also outperforms recent large-scale FMs (such as scGPT [1], Geneformer [2]) and other LLM-based single-cell data analysis pipelines (such as GenePT [3] and GPTCelltype [4]) based on our evaluations, suggesting a promising path for developing domain-specific FMs.
科研通智能强力驱动
Strongly Powered by AbleSci AI