随机森林
人工智能
估计员
计算机科学
机器学习
马修斯相关系数
分类器(UML)
随机投影
水准点(测量)
嵌入
相关性
集合预报
数据挖掘
数学
支持向量机
统计
大地测量学
地理
几何学
作者
Haotian Wang,Rujun Li,Qin Yu,Liangzhen Jiang,Ximei Luo,Quan Zou,Zhibin Lv
出处
期刊:Biochemistry
[American Chemical Society]
日期:2025-06-24
卷期号:64 (14): 3137-3147
被引量:3
标识
DOI:10.1021/acs.biochem.5c00237
摘要
Viruses are transmitted through multiple routes and can cause a wide range of diseases. Antiviral peptides (AVPs) have emerged as a cost-effective and low-side-effect strategy for combating viral infections. However, identifying antiviral peptides experimentally is both resource-intensive and time-consuming. With the advancement of artificial intelligence, accurately predicting antiviral peptide sequences has become increasingly critical to accelerate discovery efforts. In this study, we constructed a novel benchmark data set by integrating publicly available databases and literature resources. We developed an antiviral peptide prediction model named iAVP-RFVOT, which employs the BLOSUM62 matrix as the initial feature for peptide sequences and applies unified manifold approximation and projection (UMAP) embedding learning and Kozachenko-Leonenko estimator-based differential entropy calculation to extract derivative features. Following rigorous feature engineering, data rebalancing to address class imbalance, and optimization of an ensemble random forest classifier, we achieved a 5-fold cross-validation accuracy of 87.6% and a Matthew's correlation coefficient of 0.753. Through comprehensive evaluation on our independently constructed test set, the iAVP-RFVOT model demonstrates a predictive accuracy of 85.8% and a Matthew's correlation coefficient of 0.519, which substantially surpasses the performance of conventional state-of-the-art (SOTA) models.
科研通智能强力驱动
Strongly Powered by AbleSci AI