计算机科学
蛋白质功能预测
联营
图形
瓶颈
注释
人工智能
功能(生物学)
机器学习
数据挖掘
蛋白质功能
理论计算机科学
生物
基因
生物化学
进化生物学
嵌入式系统
作者
Beibei Wang,Bo Cui,S.B. Chen,Xuan Wang,Yadong Wang,Xia Li
标识
DOI:10.1093/bioinformatics/btaf285
摘要
Abstract Motivation In recent years, protein function prediction has broken through the bottleneck of sequence features, significantly improving prediction accuracy using high-precision protein structures predicted by AlphaFold2. While single-species protein function prediction methods have achieved remarkable success, multi-species approaches still face challenges such as difficulties in multi-source data integration and insufficient knowledge transfer between distantly-related species. How to integrate large-scale data and provide effective cross-species label propagation for species with sparse protein annotations remains a critical and unresolved challenge. To address this problem, we propose the MSNGO model, which integrates structural features and network propagation methods. Our validation shows that using structural features can significantly improve the accuracy of multi-species protein function prediction. Results We employ graph representation learning techniques to extract amino acid representations from protein structure contact maps and train a structural model using a graph convolution pooling module to derive protein-level structural features. After incorporating the sequence features from ESM-2, we apply a network propagation algorithm to aggregate information and update node representations within a heterogeneous network. The results demonstrate that MSNGO outperforms previous multi-species protein function prediction methods that rely on sequence features and PPI networks. Availability https://github.com/blingbell/MSNGO. Supplementary information Supplementary data are available at Bioinformatics online.
科研通智能强力驱动
Strongly Powered by AbleSci AI