蛋白质测序
脚本语言
计算机科学
计算生物学
碳水化合物反应元件结合蛋白
蛋白质-蛋白质相互作用
碳水化合物
机器学习
人工智能
生物化学
化学
肽序列
生物
转录因子
基因
操作系统
作者
Quazi Farah Nawar,Md Muhaiminul Islam Nafi,Tasnim Nishat Islam,Mohammad Saifur Rahman
标识
DOI:10.1101/2024.02.09.579590
摘要
Abstract A protein is a large complex macromolecule that has a crucial role in performing most of the work in cells and tissues. It is made up of one or more long chains of amino acid residues. Another important biomolecule, after DNA and protein, is carbohydrate. Carbohydrates interact with proteins to run various biological processes. Several biochemical experiments exist to learn the protein-carbohydrate interactions, but they are expensive, time consuming and challenging. Therefore developing computational techniques for effectively predicting protein-carbohydrate binding interactions from protein primary sequence has given rise to a prominent new field of research. In this study, we propose StackCBEmbed , an ensemble machine learning model to effectively classify protein-carbohydrate binding interactions at residue level. StackCBEmbed combines traditional sequence-based features along with features derived from a pre-trained transformer-based protein language model. To the best of our knowledge, ours is the first attempt to apply protein language model in predicting protein-carbohydrate binding interactions. StackCBEmbed achieved sensitivity, specificity and balanced accuracy scores of 0.730, 0.821, 0.776 and 0.666, 0.818, 0.742 in two separate independent test sets. This performance is superior compared to the earlier prediction models benchmarked in the same datasets. We thus hope that StackCBEmbed will discover novel protein-carbohydrate interactions and help advance the related fields of research. StackCBEmbed is freely available as python scripts at https://github.com/nafiislam/StackCBEmbed .
科研通智能强力驱动
Strongly Powered by AbleSci AI