RNA-ligand interaction scoring via data perturbation and augmentation modeling
计算生物学
计算机科学
生物系统
化学
生物
作者
Hongli Ma,Letian Gao,Yunfan Jin,Yilan Bai,Xiaofan Liu,Pengfei Bao,Ke Liu,Zhenjiang Zech Xu,Zhi John Lu
标识
DOI:10.1101/2024.06.26.600802
摘要
Abstract RNA-targeting drug discovery is undergoing an unprecedented revolution. Despite recent advances in this field, developing data-driven deep learning models remains challenging due to the limited availability of validated RNA-small molecule interactions and the scarcity of known RNA structures. In this context, we introduce RNAsmol, a novel sequence-based deep learning framework that incorporates data perturbation with augmentation, graph-based molecular feature representation and attention-based feature fusion modules to predict RNA-small molecule interactions. RNAsmol employs perturbation strategies to balance the bias between true negative and unknown interaction space thereby elucidating the intrinsic binding patterns between RNA and small molecules. The resulting model demonstrates accurate predictions of the binding between RNA and small molecules, outperforming other methods with average improvements of ∼8% (AUROC) in 10-fold cross-validation, ∼16% (AUROC) in cold evaluation (on unseen datasets), and ∼30% (ranking score) in decoy evaluation. Moreover, we use case studies to validate molecular binding hotspots in the prediction of RNAsmol, proving the model’s interpretability. In particular, we demonstrate that RNAsmol, without requiring structural input, can generate reliable predictions and be adapted to many RNA-targeting drug design scenarios.