计算机科学
蒸馏
人工智能
数据挖掘
机器学习
基于知识的系统
数据建模
电子邮件
模式识别(心理学)
知识工程
作者
Wenyu Xi,Ruheng Wang,Xiucai Ye,Tetsuya Sakurai,Leyi Wei
标识
DOI:10.1109/jbhi.2026.3686853
摘要
Accurate prediction of protein-ligand interactions is essential for drug discovery, supporting critical stages from lead optimization to therapeutic development. Many existing methods depend on high-resolution protein-ligand complex structures, which limits scalability and reduces robustness in structure-limited settings. To address these challenges, we introduce Multi-Combinatorial Knowledge Distillation (MCKD), a sequence-based framework that predicts protein-ligand interactions without requiring explicit three-dimensional structures at inference time. MCKD represents proteins and ligands as two-dimensional molecular graphs derived from their sequences and physicochemical properties, enabling effective learning from readily available inputs. To incorporate structural knowledge beyond sequence information, MCKD employs a hybrid distillation strategy that combines cross-modal distillation from a structure-based teacher with self-distillation to improve representation consistency across layers. To model protein-ligand interactions explicitly, MCKD integrates a bilinear attention network that captures residue-atom level associations and supports both binding affinity regression and binary interaction classification. Evaluations on multiple public benchmark datasets show that MCKD consistently outperforms existing sequence-based methods and achieves performance comparable to structure-based approaches. The model also generalizes well to unseen proteins and novel ligand scaffolds, while providing interpretable insights into key molecular interaction regions. These results suggest that MCKD offers a scalable and effective solution for protein-ligand interaction prediction, particularly for structure-free and data-limited drug discovery applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI