工作流程
计算机科学
质量(理念)
培训(气象学)
数据挖掘
数据库
物理
认识论
哲学
气象学
作者
Yingze Wang,Kunyang Sun,Jie Li,Xingyi Guan,Oufan Zhang,Dorian Bagni,Yang Zhang,Heather A. Carlson,Teresa Head‐Gordon
出处
期刊:Digital discovery
[The Royal Society of Chemistry]
日期:2025-01-01
摘要
Development of scoring functions (SFs) used to predict protein-ligand binding energies requires high-quality 3D structures and binding assay data for training and testing their parameters. In this work, we show that one of the widely-used datasets, PDBbind, suffers from several common structural artifacts of both proteins and ligands, which may compromise the accuracy, reliability, and generalizability of the resulting SFs. Therefore, we have developed a series of algorithms organized in a semi-automated workflow, HiQBind-WF, that curates non-covalent protein-ligand datasets to fix these problems. We also used this workflow to create an independent data set, HiQBind, by matching binding free energies from various sources including BioLiP, Binding MOAD and BindingDB with co-crystalized ligand-protein complexes from the PDB. The resulting HiQBind workflow and dataset are designed to ensure reproducibility and to minimize human intervention, while also being open-source to foster transparency in the improvements made to this important resource for the biology and drug discovery communities.
科研通智能强力驱动
Strongly Powered by AbleSci AI