摘要
Abstract Motivation Proteins interact with a variety of molecules, including other proteins, DNAs, RNAs, ligands, ions, and lipids. These interactions play a crucial role in cellular communication, metabolic regulation, gene regulation, and structural integrity, making proteins fundamental to nearly all biological functions. Accurately predicting protein interaction (binding) sites is essential for understanding protein interaction and function. Results In this work, we introduce MPBind, a multitask protein binding site prediction method, which integrates protein language models (PLMs) that can extract structural and functional information from sequences and equivariant graph neural networks (EGNNs) that can effectively capture geometric features of 3D protein structures. Through multitask learning, it can predict binding sites on proteins that interact with five key categories of binding partners: proteins, DNA/RNA, ligands, lipids, and ions. MPBind generalizes across the five molecular classes with state-of-the-art accuracy, achieving AUROC scores of 0.83 and 0.81 for protein–protein and protein–DNA/RNA-binding site prediction, respectively. Moreover, MPBind outperforms both general and task-specific binding site prediction methods, making it a useful, versatile tool for protein binding site prediction. Availability and implementation The source code of MPBind is available at the GitHub repository: https://github.com/jianlin-cheng/MPBind.