标杆管理
计算机科学
药物靶点
二元分类
机器学习
集合(抽象数据类型)
人工智能
试验装置
二进制数
数据挖掘
药物发现
计算生物学
支持向量机
生物信息学
药理学
数学
生物
营销
业务
算术
程序设计语言
作者
Tapio Pahikkala,Antti Airola,Samuli Pietilä,Sushil Kumar Shakyawar,Agnieszka Szwajda,Jing Tang,Tero Aittokallio
摘要
A number of supervised machine learning models have recently been introduced for the prediction of drug ^target interactions based on chemical structure and genomic sequence information.Although these models could offer improved means for many network pharmacology applications, such as repositioning of drugs for new therapeutic uses, the prediction models are often being constructed and evaluated under overly simplified settings that do not reflect the real-life problem in practical applications.Using quantitative drug ^target bioactivity assays for kinase inhibitors, as well as a popular benchmarking data set of binary drug ^target interactions for enzyme, ion channel, nuclear receptor and G protein-coupled receptor targets, we illustrate here the effects of four factors that may lead to dramatic differences in the prediction results: (i) problem formulation (standard binary classification or more realistic regression formulation), (ii) evaluation data set (drug and target families in the application use case), (iii) evaluation procedure (simple or nested cross-validation) and (iv) experimental setting (whether training and test sets share common drugs and targets, only drugs or targets or neither).Each of these factors should be taken into consideration to avoid reporting overoptimistic drug ^target interaction prediction results.We also suggest guidelines on how to make the supervised drug ^target interaction prediction studies more realistic in terms of such model formulations and evaluation setups that better address the inherent complexity of the prediction task in the practical applications, as well as novel benchmarking data sets that capture the continuous nature of the drug ^target interactions for kinase inhibitors.
科研通智能强力驱动
Strongly Powered by AbleSci AI