过度拟合
计算机科学
人工智能
虚拟筛选
冗余(工程)
一般化
相似性(几何)
机器学习
度量(数据仓库)
训练集
分类器(UML)
数据挖掘
指纹(计算)
相似性度量
模式识别(心理学)
数学
药物发现
生物信息学
生物
人工神经网络
数学分析
图像(数学)
操作系统
作者
Izhar Wallach,Abraham Heifets
标识
DOI:10.1021/acs.jcim.7b00403
摘要
Undetected overfitting can occur when there are significant redundancies between training and validation data. We describe AVE, a new measure of training-validation redundancy for ligand-based classification problems that accounts for the similarity amongst inactive molecules as well as active. We investigated seven widely-used benchmarks for virtual screening and classification, and show that the amount of AVE bias strongly correlates with the performance of ligand-based predictive methods irrespective of the predicted property, chemical fingerprint, similarity measure, or previously-applied unbiasing techniques. Therefore, it may be that the previously-reported performance of most ligand-based methods can be explained by overfitting to benchmarks rather than good prospective accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI