Learning functional properties of proteins with language models

计算机科学 代表(政治) 水准点(测量) 生物医学 人工智能 领域(数学) 机器学习 功能(生物学) 蛋白质结构预测 标杆管理 深度学习 蛋白质结构 生物信息学 生物 数学 生物化学 地理 业务 法学 纯数学 政治学 政治 进化生物学 营销 大地测量学
作者
Serbulent Unsal,Heval Ataş,Muammer Albayrak,Kemal Turhan,Aybar C. Acar,Tunca Doğan
出处
期刊:Nature Machine Intelligence [Springer Nature]
卷期号:4 (3): 227-245 被引量:60
标识
DOI:10.1038/s42256-022-00457-9
摘要

Data-centric approaches have been used to develop predictive methods for elucidating uncharacterized properties of proteins; however, studies indicate that these methods should be further improved to effectively solve critical problems in biomedicine and biotechnology, which can be achieved by better representing the data at hand. Novel data representation approaches mostly take inspiration from language models that have yielded ground-breaking improvements in natural language processing. Lately, these approaches have been applied to the field of protein science and have displayed highly promising results in terms of extracting complex sequence–structure–function relationships. In this study we conducted a detailed investigation over protein representation learning by first categorizing/explaining each approach, subsequently benchmarking their performances on predicting: (1) semantic similarities between proteins, (2) ontology-based protein functions, (3) drug target protein families and (4) protein–protein binding affinity changes following mutations. We evaluate and discuss the advantages and disadvantages of each method over the benchmark results, source datasets and algorithms used, in comparison with classical model-driven approaches. Finally, we discuss current challenges and suggest future directions. We believe that the conclusions of this study will help researchers to apply machine/deep learning-based representation techniques to protein data for various predictive tasks, and inspire the development of novel methods.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
太兰完成签到 ,获得积分10
1秒前
Ava应助快乐诗筠采纳,获得10
8秒前
ziyue发布了新的文献求助10
8秒前
检检边lin完成签到,获得积分10
8秒前
小星星完成签到,获得积分20
11秒前
wjy应助科研通管家采纳,获得50
12秒前
今后应助科研通管家采纳,获得10
12秒前
wjy应助科研通管家采纳,获得10
12秒前
JamesPei应助科研通管家采纳,获得10
12秒前
大模型应助科研通管家采纳,获得10
12秒前
Jasper应助科研通管家采纳,获得10
13秒前
星辰大海应助科研通管家采纳,获得10
13秒前
Jasper应助科研通管家采纳,获得30
13秒前
13秒前
13秒前
哈哈哈完成签到,获得积分10
15秒前
16秒前
哈哈哈发布了新的文献求助10
18秒前
19秒前
Lucas应助ziyue采纳,获得10
19秒前
20秒前
zxm发布了新的文献求助10
21秒前
猫猫侠发布了新的文献求助10
23秒前
23秒前
香蕉觅云应助舒适的惜霜采纳,获得10
26秒前
cocopepsi完成签到,获得积分10
26秒前
zzz4743应助安静的成风采纳,获得150
26秒前
共享精神应助ziyue采纳,获得10
27秒前
快乐诗筠完成签到,获得积分20
27秒前
小小的梦想完成签到,获得积分10
30秒前
Jasper应助zxm采纳,获得10
32秒前
852应助chen采纳,获得10
33秒前
37秒前
白三烯小童鞋完成签到 ,获得积分10
38秒前
40秒前
苍禾发布了新的文献求助10
40秒前
sunny完成签到,获得积分10
41秒前
41秒前
43秒前
hhh完成签到 ,获得积分10
43秒前
高分求助中
Thermodynamic data for steelmaking 3000
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Counseling With Immigrants, Refugees, and Their Families From Social Justice Perspectives pages 800
マンネンタケ科植物由来メロテルペノイド類の網羅的全合成/Collective Synthesis of Meroterpenoids Derived from Ganoderma Family 500
Electrochemistry 500
Broflanilide prolongs the development of fall armyworm Spodoptera frugiperda by regulating biosynthesis of juvenile hormone 400
Statistical Procedures for the Medical Device Industry 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2370424
求助须知:如何正确求助?哪些是违规求助? 2079130
关于积分的说明 5205664
捐赠科研通 1806332
什么是DOI,文献DOI怎么找? 901636
版权声明 558148
科研通“疑难数据库(出版商)”最低求助积分说明 481361