突变
计算机科学
动力学(音乐)
编码
蛋白质折叠
蛋白质动力学
序列(生物学)
人工智能
语言模型
折叠(DSP实现)
蛋白质结构
进化动力学
财产(哲学)
比例(比率)
分子动力学
蛋白质设计
计算生物学
蛋白质结构预测
无声突变
蛋白质测序
生物系统
蛋白质-蛋白质相互作用
模式(计算机接口)
计算模型
过渡(遗传学)
机器学习
作者
Chao Hou,Haiqing Zhao,Yufeng Shen
标识
DOI:10.1073/pnas.2530466123
摘要
Structural dynamics are fundamental to protein functions and mutation effects. Current protein deep learning models are predominantly trained on sequence and/or static structure data, which often fail to capture the dynamic nature of proteins. To address this, we introduce SeqDance and ESMDance, two protein language models trained on dynamic biophysical properties derived from molecular dynamics simulations and normal mode analyses of over 64,000 proteins. Both models can be directly applied to predict dynamic properties of unseen ordered and disordered proteins. SeqDance, trained from scratch, has attentions that capture dynamic interaction and comovement between residues, and its embeddings encode rich representations of protein dynamics that can be further utilized to predict conformational properties beyond the training tasks via transfer learning. SeqDance predicted dynamic property changes reflect mutation effect on protein folding stability. ESMDance, built upon ESM2 (Evolutionary Scale Model II) outputs, substantially outperforms ESM2 in zero-shot prediction of mutation effects for designed and viral proteins which lack evolutionary information. Together, SeqDance and ESMDance offer a framework for integrating protein dynamics into language models, enabling more generalizable predictions of protein behavior and mutation effects.
科研通智能强力驱动
Strongly Powered by AbleSci AI