TemStaPro: protein thermostability prediction using sequence representations from protein language models

热稳定性 序列(生物学) 计算机科学 蛋白质测序 计算生物学 自然语言处理 肽序列 生物 生物化学 基因
作者
Ieva Pudžiuvelytė,Kliment Olechnovič,Eglė Godliauskaitė,Kristupas Sermokas,Tomas Urbaitis,Giedrius Gasiūnas,Darius Kazlauskas
出处
期刊:Bioinformatics [Oxford University Press]
标识
DOI:10.1093/bioinformatics/btae157
摘要

Abstract Motivation Reliable prediction of protein thermostability from its sequence is valuable for both academic and industrial research. This prediction problem can be tackled using machine learning and by taking advantage of the recent blossoming of deep learning methods for sequence analysis. These methods can facilitate training on more data and, possibly, enable development of more versatile thermostability predictors for multiple ranges of temperatures. Results We applied the principle of transfer learning to predict protein thermostability using embeddings generated by protein language models (pLMs) from an input protein sequence. We used large pLMs that were pre-trained on hundreds of millions of known sequences. The embeddings from such models allowed us to efficiently train and validate a high-performing prediction method using over one million sequences that we collected from organisms with annotated growth temperatures. Our method, TemStaPro (Temperatures of Stability for Proteins), was used to predict thermostability of CRISPR-Cas Class II effector proteins (C2EPs). Predictions indicated sharp differences among groups of C2EPs in terms of thermostability and were largely in tune with previously published and our newly obtained experimental data. Availability and Implementation TemStaPro software and the related data are freely available from https://github.com/ievapudz/TemStaPro and https://doi.org/10.5281/zenodo.7743637.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
zxm发布了新的文献求助10
1秒前
周冬华完成签到,获得积分10
3秒前
鑫鑫完成签到,获得积分10
5秒前
yoowt完成签到,获得积分10
5秒前
Schwann翠星石完成签到,获得积分10
6秒前
8秒前
11秒前
12秒前
SciGPT应助科研通管家采纳,获得10
13秒前
13秒前
共享精神应助科研通管家采纳,获得10
13秒前
友好冷之应助科研通管家采纳,获得30
13秒前
13秒前
小乔同学发布了新的文献求助10
15秒前
16秒前
夹竹桃栗子糕完成签到,获得积分10
17秒前
旺旺雪饼发布了新的文献求助10
19秒前
19秒前
陈法国发布了新的文献求助20
24秒前
123完成签到,获得积分10
25秒前
斯文败类应助cmx采纳,获得10
26秒前
李朔星完成签到,获得积分10
27秒前
29秒前
29秒前
31秒前
慕容思卉完成签到,获得积分10
31秒前
青衫淡染墨完成签到,获得积分10
32秒前
NeuroYan发布了新的文献求助10
34秒前
YangyangShi完成签到 ,获得积分10
34秒前
叁叁鸭完成签到 ,获得积分10
34秒前
加菲丰丰应助故意的思松采纳,获得30
35秒前
Bin完成签到,获得积分10
37秒前
38秒前
41秒前
wanci应助bxl采纳,获得10
42秒前
ref:rain发布了新的文献求助10
43秒前
lokiuiw发布了新的文献求助10
47秒前
pluto应助缓慢的诗霜采纳,获得30
47秒前
木歌应助飞飞采纳,获得20
48秒前
搜集达人应助陈法国采纳,获得10
49秒前
高分求助中
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Cross-Cultural Psychology: Critical Thinking and Contemporary Applications (8th edition) 800
Counseling With Immigrants, Refugees, and Their Families From Social Justice Perspectives pages 800
マンネンタケ科植物由来メロテルペノイド類の網羅的全合成/Collective Synthesis of Meroterpenoids Derived from Ganoderma Family 500
[Lambert-Eaton syndrome without calcium channel autoantibodies] 400
Statistical Procedures for the Medical Device Industry 400
藍からはじまる蛍光性トリプタンスリン研究 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2376217
求助须知:如何正确求助?哪些是违规求助? 2084242
关于积分的说明 5227243
捐赠科研通 1810992
什么是DOI,文献DOI怎么找? 903888
版权声明 558463
科研通“疑难数据库(出版商)”最低求助积分说明 482527