计算机科学
赖氨酸
人工智能
学习迁移
机器学习
传输(计算)
计算生物学
化学
生物
生物化学
氨基酸
并行计算
作者
Yu Qiao,Xiaolei Zhu,H. Gong
出处
期刊:Bioinformatics
[Oxford University Press]
日期:2021-10-13
卷期号:38 (3): 648-654
被引量:35
标识
DOI:10.1093/bioinformatics/btab712
摘要
As one of the most important post-translational modifications (PTMs), protein lysine crotonylation (Kcr) has attracted wide attention, which involves in important physiological activities, such as cell differentiation and metabolism. However, experimental methods are expensive and time-consuming for Kcr identification. Instead, computational methods can predict Kcr sites in silico with high efficiency and low cost.In this study, we proposed a novel predictor, BERT-Kcr, for protein Kcr sites prediction, which was developed by using a transfer learning method with pre-trained bidirectional encoder representations from transformers (BERT) models. These models were originally used for natural language processing (NLP) tasks, such as sentence classification. Here, we transferred each amino acid into a word as the input information to the pre-trained BERT model. The features encoded by BERT were extracted and then fed to a BiLSTM network to build our final model. Compared with the models built by other machine learning and deep learning classifiers, BERT-Kcr achieved the best performance with AUROC of 0.983 for 10-fold cross validation. Further evaluation on the independent test set indicates that BERT-Kcr outperforms the state-of-the-art model Deep-Kcr with an improvement of about 5% for AUROC. The results of our experiment indicate that the direct use of sequence information and advanced pre-trained models of NLP could be an effective way for identifying PTM sites of proteins.The BERT-Kcr model is publicly available on http://zhulab.org.cn/BERT-Kcr_models/.Supplementary data are available at Bioinformatics online.
科研通智能强力驱动
Strongly Powered by AbleSci AI