Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

推论 计算机科学 机器学习 稳健性(进化) 隐马尔可夫模型 人工智能 变压器 自回归模型 生物 工程类 遗传学 数学 电压 电气工程 计量经济学 基因
作者
Pascal Notin,Mafalda Dias,Jonathan Frazer,Javier Marchena-Hurtado,Aidan N. Gomez,Debora S. Marks,Yarin Gal
出处
期刊:Cornell University - arXiv 被引量:32
标识
DOI:10.48550/arxiv.2205.13760
摘要

The ability to accurately model the fitness landscape of protein sequences is critical to a wide range of applications, from quantifying the effects of human variants on disease likelihood, to predicting immune-escape mutations in viruses and designing novel biotherapeutic proteins. Deep generative models of protein sequences trained on multiple sequence alignments have been the most successful approaches so far to address these tasks. The performance of these methods is however contingent on the availability of sufficiently deep and diverse alignments for reliable training. Their potential scope is thus limited by the fact many protein families are hard, if not impossible, to align. Large language models trained on massive quantities of non-aligned protein sequences from diverse families address these problems and show potential to eventually bridge the performance gap. We introduce Tranception, a novel transformer architecture leveraging autoregressive predictions and retrieval of homologous sequences at inference to achieve state-of-the-art fitness prediction performance. Given its markedly higher performance on multiple mutants, robustness to shallow alignments and ability to score indels, our approach offers significant gain of scope over existing approaches. To enable more rigorous model testing across a broader range of protein families, we develop ProteinGym -- an extensive set of multiplexed assays of variant effects, substantially increasing both the number and diversity of assays compared to existing benchmarks.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
知源完成签到 ,获得积分10
刚刚
刚刚
平淡山柏应助cricket采纳,获得10
1秒前
聪明晓蓝完成签到,获得积分20
1秒前
1秒前
2秒前
阳光溪流发布了新的文献求助10
3秒前
善学以致用应助涛1采纳,获得10
4秒前
狗蛋完成签到,获得积分10
4秒前
小钱钱发布了新的文献求助10
4秒前
5秒前
6秒前
靳志强发布了新的文献求助30
7秒前
kmkz发布了新的文献求助10
8秒前
9秒前
负责冰凡完成签到,获得积分20
10秒前
隐形曼青应助失眠的海云采纳,获得10
10秒前
10秒前
丘比特应助mingjie采纳,获得10
11秒前
李健的小迷弟应助张倩采纳,获得10
11秒前
Q123ba叭发布了新的文献求助10
11秒前
wwww完成签到 ,获得积分10
12秒前
Lemenchichi完成签到,获得积分10
13秒前
研友_LwlRen发布了新的文献求助10
14秒前
wanci应助研友_V8RmmZ采纳,获得10
14秒前
隐形曼青应助研友_V8RmmZ采纳,获得10
14秒前
ding应助研友_V8RmmZ采纳,获得10
14秒前
李爱国应助研友_V8RmmZ采纳,获得10
15秒前
乐乐应助研友_V8RmmZ采纳,获得30
15秒前
科研助手6应助研友_V8RmmZ采纳,获得10
15秒前
李健应助研友_V8RmmZ采纳,获得10
15秒前
17秒前
天上星星亮晶晶完成签到,获得积分20
17秒前
17秒前
20秒前
缘星紫完成签到,获得积分10
20秒前
丙丙sunny发布了新的文献求助10
21秒前
星星完成签到,获得积分10
21秒前
赘婿应助dada采纳,获得10
22秒前
22秒前
高分求助中
Les Mantodea de Guyane Insecta, Polyneoptera 2500
One Man Talking: Selected Essays of Shao Xunmei, 1929–1939 (PDF!) 1000
Technologies supporting mass customization of apparel: A pilot project 450
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
China Gadabouts: New Frontiers of Humanitarian Nursing, 1941–51 400
The Healthy Socialist Life in Maoist China, 1949–1980 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3789328
求助须知:如何正确求助?哪些是违规求助? 3334334
关于积分的说明 10269432
捐赠科研通 3050794
什么是DOI,文献DOI怎么找? 1674162
邀请新用户注册赠送积分活动 802530
科研通“疑难数据库(出版商)”最低求助积分说明 760693