抗冻蛋白
人工智能
支持向量机
计算机科学
机器学习
互补性(分子生物学)
序列(生物学)
相似性(几何)
特征(语言学)
模式识别(心理学)
自然语言处理
化学
生物
图像(数学)
哲学
生物化学
遗传学
语言学
作者
Saikat Dhibar,Biman Jana
标识
DOI:10.1021/acs.jpclett.3c02817
摘要
Antifreeze proteins (AFPs) bind to growing iceplanes owing to their structural complementarity nature, thereby inhibiting the ice-crystal growth by thermal hysteresis. Classification of AFPs from sequence is a difficult task due to their low sequence similarity, and therefore, the usual sequence similarity algorithms, like Blast and PSI-Blast, are not efficient. Here, a method combining n-gram feature vectors and machine learning models to accelerate the identification of potential AFPs from sequences is proposed. All these n-gram features are extracted from the K-mer counting method. The comparative analysis reveals that, among different machine learning models, Xgboost outperforms others in predicting AFPs from sequence when penta-mers are used as a feature vector. When tested on an independent dataset, our method performed better compared to other existing ones with sensitivity of 97.50%, recall of 98.30%, and f1 score of 99.10%. Further, we used the SHAP method, which provides important insight into the functional activity of AFPs.
科研通智能强力驱动
Strongly Powered by AbleSci AI