CollagenTransformer: End-to-End Transformer Model to Predict Thermal Stability of Collagen Triple Helices Using an NLP Approach

变压器 计算机科学 人工智能 试验装置 线性可变差动变压器 生物系统 算法 材料科学 配电变压器 电压 生物 工程类 电气工程
作者
Eesha Khare,Constancio González‐Obeso,David L. Kaplan,Markus J. Buehler
出处
期刊:ACS Biomaterials Science & Engineering [American Chemical Society]
卷期号:8 (10): 4301-4310 被引量:27
标识
DOI:10.1021/acsbiomaterials.2c00737
摘要

Collagen is one of the most important structural proteins in biology, and its structural hierarchy plays a crucial role in many mechanically important biomaterials. Here, we demonstrate how transformer models can be used to predict, directly from the primary amino acid sequence, the thermal stability of collagen triple helices, measured via the melting temperature Tm. We report two distinct transformer architectures to compare performance. First, we train a small transformer model from scratch, using our collagen data set featuring only 633 sequence-to-Tm pairings. Second, we use a large pretrained transformer model, ProtBERT, and fine-tune it for a particular downstream task by utilizing sequence-to-Tm pairings, using a deep convolutional network to translate natural language processing BERT embeddings into required features. Both the small transformer model and the fine-tuned ProtBERT model have similar R2 values of test data (R2 = 0.84 vs 0.79, respectively), but the ProtBERT is a much larger pretrained model that may not always be applicable for other biological or biomaterials questions. Specifically, we show that the small transformer model requires only 0.026% of the number of parameters compared to the much larger model but reaches almost the same accuracy for the test set. We compare the performance of both models against 71 newly published sequences for which Tm has been obtained as a validation set and find reasonable agreement, with ProtBERT outperforming the small transformer model. The results presented here are, to our best knowledge, the first demonstration of the use of transformer models for relatively small data sets and for the prediction of specific biophysical properties of interest. We anticipate that the work presented here serves as a starting point for transformer models to be applied to other biophysical problems.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
xiongyuan发布了新的文献求助20
1秒前
1秒前
2秒前
1234发布了新的文献求助30
2秒前
伍盎完成签到,获得积分10
3秒前
3秒前
4秒前
刘子豪完成签到,获得积分20
4秒前
6秒前
秋辞发布了新的文献求助10
7秒前
MH应助dai采纳,获得10
8秒前
繁荣的豁发布了新的文献求助10
8秒前
刘子豪发布了新的文献求助30
8秒前
li完成签到,获得积分10
9秒前
行走人生完成签到,获得积分10
9秒前
领导范儿应助1234采纳,获得10
11秒前
黎明星发布了新的文献求助10
11秒前
11秒前
研猫完成签到 ,获得积分10
11秒前
max发布了新的文献求助10
12秒前
talpionchen完成签到,获得积分10
12秒前
迷人的平松完成签到,获得积分10
12秒前
小吴完成签到,获得积分20
12秒前
碗碗发布了新的文献求助10
13秒前
共享精神应助哼哼采纳,获得10
15秒前
15秒前
Singularity应助0099采纳,获得10
17秒前
19秒前
大个应助lizhiqian2024采纳,获得10
19秒前
勤奋的鲂完成签到,获得积分20
19秒前
李荣航发布了新的文献求助10
20秒前
20秒前
矿小黑完成签到,获得积分10
20秒前
归尘完成签到,获得积分10
21秒前
21秒前
22秒前
SciGPT应助明理的小甜瓜采纳,获得10
22秒前
彭于晏应助Monn采纳,获得10
23秒前
温暖幻桃发布了新的文献求助10
24秒前
gaoqg完成签到,获得积分10
24秒前
高分求助中
Encyclopedia of Mathematical Physics 2nd edition 888
Technologies supporting mass customization of apparel: A pilot project 600
材料概论 周达飞 ppt 500
Nonrandom distribution of the endogenous retroviral regulatory elements HERV-K LTR on human chromosome 22 500
Introduction to Strong Mixing Conditions Volumes 1-3 500
Optical and electric properties of monocrystalline synthetic diamond irradiated by neutrons 320
科学教育中的科学本质 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3806811
求助须知:如何正确求助?哪些是违规求助? 3351524
关于积分的说明 10354611
捐赠科研通 3067340
什么是DOI,文献DOI怎么找? 1684489
邀请新用户注册赠送积分活动 809716
科研通“疑难数据库(出版商)”最低求助积分说明 765635