计算机科学
人工智能
嵌入
蛋白质功能预测
模式识别(心理学)
水准点(测量)
组分(热力学)
卷积(计算机科学)
特征(语言学)
机器学习
编码器
深度学习
人工神经网络
蛋白质功能
生物化学
化学
物理
语言学
哲学
大地测量学
基因
操作系统
地理
热力学
作者
Xun Wang,Peng Qu,Xiangyu Meng,Qing Yang,Lian Qiao,Chaogang Zhang,Xian-Jin Xie
标识
DOI:10.1109/bibm58861.2023.10385754
摘要
Predicting protein function from sequences through machine learning can improve the understanding of novel proteins and biological mechanisms. Existing methods mainly rely on one-dimensional convolution or natural language processing (NLP) techniques to extract features from sequences, but they suffer from limited predictive performance. To address this challenge, we propose MulAxialGO, a new method that leverages multi-modal feature fusion to improve prediction accuracy. MulAxialGO integrates the prior features of a large-scale pre-trained protein language model and the posterior features of dynamic embedding coding and sequence homology. In addition, MulAxialGO employs a comprehensive image feature encoder to extract features from sequences, providing a novel perspective for protein function prediction. MulAxialGO is tested on two benchmark datasets and achieves state-of-the-art results. On the 2016 dataset, MulAxialGO significantly outperforms DeepGOPlus, improving molecular function by 4.5 points, biological process by 2.4 points and cellular component by 1.6 points for the AUPR metric. Similarly, on the NetGO dataset, MulAxialGO outperforms the state-of-the-art NetGO2.0, improving Fmax by 1.1 points for biological process and 2.3 points for cellular component.
科研通智能强力驱动
Strongly Powered by AbleSci AI