人工智能
计算机科学
机器学习
深度学习
二进制数
二元分类
转化(遗传学)
功能(生物学)
算法
模式识别(心理学)
数据挖掘
支持向量机
数学
生物化学
化学
进化生物学
生物
基因
算术
作者
Jacob Barger,Badri Adhikari
标识
DOI:10.1109/tcbb.2021.3115053
摘要
Much of the recent success in protein structure prediction has been a result of accurate protein contact prediction---a binary classification problem. As an alternative, we recently proposed real-valued distance predictions, formulating the problem as a regression problem. The nuances of protein 3D structures make this formulation appropriate, allowing predictions to reflect inter-residue distances in nature. Despite these promises, the accurate prediction of real-valued distances remains relatively unexplored. To investigate if regression methods can be designed to predict real-valued distances as precisely as binary contacts, here we propose multiple novel methods of input label engineering with the goal of optimizing the distribution of distances to cater to the loss function of the deep-learning model. Our results demonstrate, for the first time, that deep learning methods for real-valued protein distance prediction can deliver distances as precise as binary classification methods. When using an optimal distance transformation function on the standard PSICOV dataset consisting of 150 representative proteins, the precision of 'top-all' long-range contacts improves from 60.9% to 61.4% when predicting real-valued distances instead of contacts. When building three-dimensional models we observed an average TM-score increase from 0.61 to 0.72, highlighting the advantage of predicting real-valued distances.
科研通智能强力驱动
Strongly Powered by AbleSci AI