作者
Weimin Li,Xiaoyang Li,Mengying Wang,Fangfang Liu,Yin Luo,Ruiqiang Guo,Quan-Ke Pan
摘要
Prediction of protein-ligand interactions is critical for drug discovery and repositioning. Traditional prediction methods are computationally intensive and limited in modeling structural changes. In contrast, data-driven deep learning methods significantly reduce computational costs and offer a more efficient approach for drug discovery. However, existing models often fail to fully exploit metadata and low-frequency features, leading to suboptimal performance on sparse, imbalanced datasets. To address these challenges, this paper proposes a novel interaction prediction model based on heterogeneous graphs and data enhancement, named Heterogeneous Graph Enhanced Fusion Network (HGEF-Net). The model utilizes a heterogeneous information learning module, which deeply analyzes molecular subgraphs and substructures, fully leveraging metadata features to better capture the biological interactions between ligands and proteins. Additionally, to address the issue of low-frequency category features, a data enhancement strategy based on multi-level contrastive learning is proposed. Furthermore, a heterogeneous attention integration framework is presented, which uses multi-level attention to assign different weights to various features. This approach efficiently fuses both intramolecular and intermolecular features, enhancing the model's ability to capture key information and improving its performance on sparse, imbalanced datasets. Experimental results show that HGEF-Net outperforms other state-of-the-art models. On the BindingDB dataset (1:100 positive-to-negative ratio), HGEF-Net achieves an AUC of 0.826, AUPRC of 0.811, Precision of 0.715, and Recall of 0.709. On the Davis dataset (1:10 ratio), the data enhancement module improves AUC, AUPRC, Precision, and Recall by 11.7%, 9.7%, 10.5%, and 16.3%, respectively, validating the model's effectiveness.