计算机科学
编码
人工智能
蛋白质-蛋白质相互作用
机器学习
财产(哲学)
溶解度
蛋白质结构预测
融合
语义学(计算机科学)
自然语言处理
计算复杂性理论
深度学习
编码(内存)
模式识别(心理学)
组分(热力学)
结构化预测
传感器融合
作者
Yuhang Zhang,Peilin Chen,Keyan Ding,Han Liu,Shiqi Wang,Qi Song
标识
DOI:10.1109/jbhi.2025.3608273
摘要
Protein solubility is a critical determinant of biologic candidates' developability, stability, and therapeutic efficacy. However, accurate solubility prediction remains a central challenge in computational protein engineering due to the inherent complexity within protein sequences. In this work, we propose a multimodal prompt learning framework, called MPSol, for protein solubility prediction that integrates complementary representations derived from primary sequences, structural proxies, and textual descriptions generated by large language models (LLMs). MPSol is built upon a unified multimodal backbone with a dedicated cross-modal fusion module that captures fine-grained interactions across modalities. In addition, we design label-aware prompts that encode solubility-specific semantic cues associated with each class. These prompts provide semantic supervision, guiding the alignment of fused protein representations to promote semantic consistency. Extensive experiments demonstrate that MPSol achieves state-of-the-art performance, reaching an accuracy of 0.815, AUC of 0.867 and MCC of 0.642 on the standard PDBSol test set, and generalizes well to the external out-of-distribution test dataset with an accuracy of 0.632, AUC of 0.653 and MCC of 0.332. These results underscore the potential of prompt-driven multimodal learning for interpretable and effective protein property prediction.
科研通智能强力驱动
Strongly Powered by AbleSci AI