序列(生物学)
支持向量机
计算生物学
计算机科学
载体(分子生物学)
人工智能
化学
机器学习
数据挖掘
生物化学
生物
重组DNA
基因
作者
Ghazaleh Taherzadeh,Yaoqi Zhou,Alan Wee‐Chung Liew,Yuedong Yang
标识
DOI:10.1021/acs.jcim.6b00320
摘要
Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew's correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at http://sparks-lab.org/server/SPRINT-CBH .
科研通智能强力驱动
Strongly Powered by AbleSci AI