保险丝(电气)
毒力因子
路径(计算)
特征(语言学)
计算机科学
对偶(语法数字)
因子(编程语言)
毒力
建筑
人工智能
生物
工程类
基因
程序设计语言
电气工程
遗传学
艺术
语言学
哲学
文学类
视觉艺术
作者
Lucheng Huang,Xiangyu Yu,Shumei Li,Qingwei Chen,Dan Xu,Qi Zhao
出处
期刊:PubMed
日期:2025-08-31
卷期号:26 (5)
摘要
Accurate prediction of bacterial virulence factors (VFs) is crucial for combating infectious diseases, yet traditional methods often fail to capture their complex sequence properties. We address this challenge by leveraging deep, context-aware representations from large-scale protein language models (PLMs). Our framework begins with a systematic engineering of features from ESM-2 and ProtT5, which confirmed their complementary nature but also revealed that simple concatenation is a suboptimal fusion strategy due to a "feature overshadowing" effect. To overcome this, we developed two novel architectures: VF-Iter, for robust feature enhancement via iterative low-rank updates, and the Dual-Path Feature Fusion (DPF) network, for intelligently integrating the complementary embeddings. The construction of our final model, VF-Fuse, involved a two-stage process. First, we selected four powerful and diverse base models representing our distinct feature strategies (ESM-2 only, ProtT5 only, simple concatenation, and DPF). Second, we empirically determined the best method for combining their predictions by benchmarking 15 ensemble techniques, from which Majority Voting emerged as the superior choice. On the independent test set, VF-Fuse establishes a new state of the art, achieving a superior F1-Score of 87.15% and a Matthews Correlation Coefficient of 73.61%. This F1-Score marks a significant 3.3% improvement over the previous best method, driven by an excellent balance between a high Sensitivity of 90.1% and a strong Specificity of 83.33%. Crucially, in-depth interpretability analyses validated our architectural design, demonstrating how the DPF model learns to intelligently route complementary features to specialized pathways.
科研通智能强力驱动
Strongly Powered by AbleSci AI