计算机科学
判别式
支持向量机
人工智能
伪氨基酸组成
特征(语言学)
集合预报
机器学习
模式识别(心理学)
数据挖掘
肽
化学
语言学
生物化学
哲学
二肽
作者
Muhammad Arif,Saeed Ahmed,Fang Ge,Muhammad Kabir,Yaser Daanial Khan,Dong‐Jun Yu,Maha A. Thafar
标识
DOI:10.1016/j.chemolab.2021.104458
摘要
Anticancer peptides (ACPs) have been emerged as a potential safe therapeutic agent for treating cancer. Identifying novel ACPs is crucial for understanding deep insight their functional mechanisms and vaccine production. Conventional wet-lab technological methods for finding ACPs are overpriced, slow, and resource-intensive. Thus, fast and accurate ACPs prediction through computational approach is highly desired because of massive peptide sequences accumulated in the post-genomic era. Recently, several intelligent statistical approaches have been designed for discriminating ACPs from non-ACPs. Although remarkable achievements have been accomplished, available methods still have inadequate feature descriptors and learning algorithms, thereby restricting the predictive performance. To address this, we develop a novel predictor called Stack-ACPred for the correct identification of ACPs. More specifically, the proposed method possesses three nominal feature encoding strategies i.e., evolutionary-profile and physicochemical information as segmented position-specific scoring matrix (SegPSSM), pseudo (PsePSSM), and extended pseudo amino acid composition (PseAAC). The extracted features are serially fused and further optimized through a powerful support vector machine recursive feature elimination and correlation bias reduction (SVM-RFE + CBR) algorithm. The optimal selected attributes are provided to build the stacking-base ensemble model for targeting effective ACPs. The proposed StackACPred attained 84.45% and 86.21% accuracy based on ACP740 and ACP240 datasets with 5-fold cross-validation test, which was 2.97% and 0.79% higher than other existing studies, respectively. The empirical outcomes of our developed automated tool demonstrate the excellent discriminative power for annotating large scale ACPs in particular and other peptides in general. • We developed an intelligent predictor named StackACPred for correct identification of ACPs. • Three nominal feature encoding strategies on the bases of evolutionary-profile and physicochemical information as: N-Segmentation position-specific scoring matrix (N-SegPSSM), pseudo (PsePSSM), and extended pseudo amino acid composition (PseAAC). • Powerful support vector machine recursive feature elimination and correlation bias reduction (SVM-RFE + CBR) algorithm was used to select the optimal features. • LightGMB and stacking-base ensemble classifiers were used for predicting ACPs with k-fold cross-validation test. • StackACPred produced better results than others state-of-the-art predictors.
科研通智能强力驱动
Strongly Powered by AbleSci AI