计算机科学
稳健性(进化)
序列(生物学)
药物发现
药物靶点
代表(政治)
人工智能
训练集
蛋白质测序
任务(项目管理)
均方误差
机器学习
计算生物学
数据挖掘
生物信息学
肽序列
数学
化学
生物
统计
遗传学
经济
基因
管理
法学
政治
生物化学
政治学
作者
Zhiqiang Hu,Wenfeng Liu,Chenbin Zhang,Jiawen Huang,Shaoting Zhang,Huiqun Yu,Yi Xiong,Hao Líu,Ke Song,Liang Hong
摘要
Drug-target binding affinity prediction is a fundamental task for drug discovery and has been studied for decades. Most methods follow the canonical paradigm that processes the inputs of the protein (target) and the ligand (drug) separately and then combines them together. In this study we demonstrate, surprisingly, that a model is able to achieve even superior performance without access to any protein-sequence-related information. Instead, a protein is characterized completely by the ligands that it interacts. Specifically, we treat different proteins separately, which are jointly trained in a multi-head manner, so as to learn a robust and universal representation of ligands that is generalizable across proteins. Empirical evidences show that the novel paradigm outperforms its competitive sequence-based counterpart, with the Mean Squared Error (MSE) of 0.4261 versus 0.7612 and the R-Square of 0.7984 versus 0.6570 compared with DeepAffinity. We also investigate the transfer learning scenario where unseen proteins are encountered after the initial training, and the cross-dataset evaluation for prospective studies. The results reveals the robustness of the proposed model in generalizing to unseen proteins as well as in predicting future data. Source codes and data are available at https://github.com/huzqatpku/SAM-DTA.
科研通智能强力驱动
Strongly Powered by AbleSci AI