基因命名
集合(抽象数据类型)
功能基因组学
管道(软件)
背景(考古学)
基因组学
功能(生物学)
计算生物学
基因本体论
基因
计算机科学
标杆管理
基因注释
本体论
生物
遗传学
基因表达
基因组
程序设计语言
分类学(生物学)
古生物学
植物
营销
命名法
业务
哲学
认识论
作者
Dexter Pratt,Mengzhou Hu,Sahar Alkhairy,Ingoo Lee,Rudolf Pillich,Robin E. Bachelder,Trey Ideker
出处
期刊:Research Square - Research Square
日期:2023-09-18
被引量:5
标识
DOI:10.21203/rs.3.rs-3270331/v1
摘要
Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI's GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50% of cases, while in most remaining cases it recovered the name of a more general concept. In gene sets discovered in 'omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants.
科研通智能强力驱动
Strongly Powered by AbleSci AI