随机森林
计算生物学
致病性
背景(考古学)
计算机科学
氨基酸残基
人工智能
蛋白质结构预测
序列(生物学)
生物信息学
机器学习
肽序列
遗传学
基因
生物
蛋白质结构
生物化学
微生物学
古生物学
作者
Xiong Yao,Jingbo Zhou,Ke An,Wei Han,Tao Wang,Zhiqiang Ye,Yun Na Wu
摘要
The wide application of gene sequencing has accumulated numerous amino acid substitutions (AAS) with unknown significance, posing significant challenges to predicting and understanding their pathogenicity. While various prediction methods have been proposed, most are sequence-based and lack insights for molecular mechanisms from the perspective of protein structures. Moreover, prediction performance must be improved.Herein, we trained a random forest (RF) prediction model, namely AAS3D-RF, underscoring sequence and three-dimensional (3D) structure-based features to explore the relationship between diseases and AASs.AAS3D-RF was trained on more than 14,000 AASs with 21 selected features, and obtained accuracy (ACC) between 0.811 and 0.839 and Matthews correlation coefficient (MCC) between 0.591 and 0.684 on two independent testing datasets, superior to seven existing tools. In addition, AAS3D-RF possesses unique structure-based features, context-dependent substitution score (CDSS) and environment-dependent residue contact energy (ERCE), which could be applied to interpret whether pathogenic AASs would introduce incompatibilities to the protein structural microenvironments.AAS3D-RF serves as a valuable tool for both predicting and understanding pathogenic AASs.
科研通智能强力驱动
Strongly Powered by AbleSci AI