A multimodal conversational agent for DNA, RNA and protein tasks

DNA 核糖核酸 计算机科学 计算生物学 人机交互 生物 遗传学 基因
作者
Bernardo P. de Almeida,Guillaume Richard,Hugo Dalla-Torre,Christopher Blum,Lorenz Hexemer,Priyanka Pandey,Stefan Laurent,Chandana Rajesh,Marie Lopez,Alexandre Laterre,Maren Lang,Uğur Şahin,Karim Beguir,Thomas Pierrot
出处
期刊:Nature Machine Intelligence [Nature Portfolio]
卷期号:7 (6): 928-941 被引量:19
标识
DOI:10.1038/s42256-025-01047-1
摘要

Language models are thriving, powering conversational agents that assist and empower humans to solve a number of tasks. Recently, these models were extended to support additional modalities including vision, audio and video, demonstrating impressive capabilities across multiple domains, including healthcare. Still, conversational agents remain limited in biology as they cannot yet fully comprehend biological sequences. Meanwhile, high-performance foundation models for biological sequences have been built through self-supervision over sequencing data, but these need to be fine-tuned for each specific application, preventing generalization between tasks. In addition, these models are not conversational, which limits their utility to users with coding capabilities. Here we propose to bridge the gap between biology foundation models and conversational agents by introducing ChatNT, a multimodal conversational agent with an advanced understanding of biological sequences. ChatNT achieves new state-of-the-art results on the Nucleotide Transformer benchmark while being able to solve all tasks at once, in English, and to generalize to unseen questions. In addition, we have curated a set of more biologically relevant instruction tasks from DNA, RNA and proteins, spanning multiple species, tissues and biological processes. ChatNT reaches performance on par with state-of-the-art specialized methods on those tasks. We also present a perplexity-based technique to help calibrate the confidence of our model predictions. By applying attribution methods through the English decoder and DNA encoder, we demonstrate that ChatNT’s answers are based on biologically coherent features such as detecting the promoter TATA motif or splice site dinucleotides. Our framework for genomics instruction tuning can be extended to more tasks and data modalities (for example, structure and imaging), making it a widely applicable tool for biology. ChatNT provides a potential direction for building generally capable agents that understand biology from first principles while being accessible to users with no coding background. De Almeida, Richard and colleagues leverage transfer learning to create ChatNT, a multimodal conversational agent for DNA, RNA and protein sequences that can be instructed in natural language.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
hoax完成签到,获得积分10
刚刚
斯文败类应助正在加载采纳,获得10
刚刚
Lucas应助小L采纳,获得10
1秒前
1秒前
Akim应助dr1nk采纳,获得10
1秒前
梦想家完成签到,获得积分10
2秒前
tcx完成签到,获得积分10
2秒前
香蕉草丛完成签到,获得积分10
2秒前
jimmy完成签到,获得积分10
3秒前
胡萝卜完成签到,获得积分10
3秒前
直率毛豆发布了新的文献求助10
3秒前
Yuan88完成签到,获得积分10
3秒前
zjl完成签到,获得积分20
3秒前
cici发布了新的文献求助10
4秒前
5秒前
多肉葡萄发布了新的文献求助10
5秒前
6秒前
qingfengpu应助Jocelin采纳,获得30
6秒前
sycsyc完成签到,获得积分10
7秒前
7秒前
华仔应助某某采纳,获得10
7秒前
7秒前
sunny发布了新的文献求助10
8秒前
苗条三问完成签到,获得积分10
8秒前
8秒前
我是老大应助科研通管家采纳,获得10
8秒前
英俊的铭应助科研通管家采纳,获得10
8秒前
小安应助科研通管家采纳,获得10
8秒前
无极微光应助科研通管家采纳,获得20
8秒前
领导范儿应助科研通管家采纳,获得10
9秒前
大模型应助科研通管家采纳,获得10
9秒前
9秒前
小安应助科研通管家采纳,获得10
9秒前
FashionBoy应助科研通管家采纳,获得10
9秒前
田様应助科研通管家采纳,获得10
9秒前
荣大有发布了新的文献求助10
9秒前
小蘑菇应助科研通管家采纳,获得10
9秒前
9秒前
Owen应助科研通管家采纳,获得10
9秒前
9秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Introduction to Helicopter and Tiltrotor Flight Simulation, Second Edition 2500
卤化钙钛矿人工突触的研究 2000
Malcolm Fraser : a biography 700
Signals, Systems, and Signal Processing 610
Bounds for Statistical Estimation in Semiparametric Models 500
Forced degradation and stability indicating LC method for Letrozole: A stress testing guide 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6504010
求助须知:如何正确求助?哪些是违规求助? 8298539
关于积分的说明 17713520
捐赠科研通 5602948
什么是DOI,文献DOI怎么找? 2919702
邀请新用户注册赠送积分活动 1897027
关于科研通互助平台的介绍 1758603