计算机科学
马修斯相关系数
蛋白质配体
药物发现
序列(生物学)
分类器(UML)
人工智能
配体(生物化学)
计算生物学
数据挖掘
模式识别(心理学)
化学
生物信息学
生物
支持向量机
生物化学
受体
作者
Yongsheng Ding,Jijun Tang
标识
DOI:10.1021/acs.jcim.7b00307
摘要
Identifying protein–ligand binding sites is an important process in drug discovery and structure-based drug design. Detecting protein–ligand binding sites is expensive and time-consuming by traditional experimental methods. Hence, computational approaches provide many effective strategies to deal with this issue. Recently, lots of computational methods are based on structure information on proteins. However, these methods are limited in the common scenario, where both the sequence of protein target is known and sufficient 3D structure information is available. Studies indicate that sequence-based computational approaches for predicting protein–ligand binding sites are more practical. In this paper, we employ a novel computational model of protein–ligand binding sites prediction, using protein sequence. We apply the Discrete Cosine Transform (DCT) to extract feature from Position-Specific Score Matrix (PSSM). In order to improve the accuracy, Predicted Relative Solvent Accessibility (PRSA) information is also utilized. The predictor of protein–ligand binding sites is built by employing the ensemble weighted sparse representation model with random under-sampling. To evaluate our method, we conduct several comprehensive tests (12 types of ligands testing sets) for predicting protein–ligand binding sites. Results show that our method achieves better Matthew’s correlation coefficient (MCC) than other outstanding methods on independent test sets of ATP (0.506), ADP (0.511), AMP (0.393), GDP (0.579), GTP (0.641), Mg2+ (0.317), Fe3+ (0.490) and HEME (0.640). Our proposed method outperforms earlier predictors (the performance of MCC) in 8 of the 12 ligands types.
科研通智能强力驱动
Strongly Powered by AbleSci AI