特征选择
磷酸化
计算机科学
编码器
特征提取
人工智能
变压器
赖氨酸
计算生物学
模式识别(心理学)
数据挖掘
生物
生物化学
工程类
氨基酸
电压
电气工程
操作系统
作者
Songning Lai,Xifeng Hu,Jing Han,Chun Wang,Subhas Chandra Mukhopadhyay,Zhi Li,Lan Ye
标识
DOI:10.1109/cisp-bmei56279.2022.9979871
摘要
Phosphorylation, a post-translational modification of proteins, greatly affects protein structure and functionand plays an important role in the pathogenesis of human diseases. Elucidation of the molecular mechanism of phosphorylation is important for the development of therapeutic agents for some diseases. Nowdays, identification of phosphorylation sites is one of the hotspots in many studies. However, it is difficult and costly to identify phosphorylation sites only by conventional experimental methods. In our works, we focued on developing a model to predict the phosphorylation sites of lysine. This model uses protein feature acquisition, F_Score feature selection, KNN data cleaning, SMOTE synthesis of positive samples and other algorithms to construct the feature set. Subsequently, the transformer-based BERT classification technique was applied to this prediction model. In the BERT model, the present study used two different feature sequence inputing methods. the accuracy are 98.43% and 99.61%, and the MCC are 96.5% and 99.1% respectively, which are better than other previous models for predicting phosphoglycerylation sites. The results of our work have an incalculable future.
科研通智能强力驱动
Strongly Powered by AbleSci AI