计算机科学
人工智能
代表(政治)
特征学习
源代码
图形
功能(生物学)
特征(语言学)
蛋白质功能预测
模式识别(心理学)
机器学习
蛋白质功能
理论计算机科学
生物
基因
进化生物学
生物化学
语言学
哲学
政治
政治学
法学
操作系统
作者
Zhijian Huang,Ruisheng Zheng,Lei Deng
标识
DOI:10.1109/bibm55620.2022.9994899
摘要
Exploring the functions of proteins is crucial for explaining cellular mechanisms, treating diseases, and developing new drugs. Due to experimental limitations, large-scale identification of protein function remains a challenging task in cell biology. Here we propose DeepFusionGo, a novel protein function prediction method that adopts a graph representation learning approach (GraphSAGE) to extract features from heterogeneous data sources. First, we generate embeddings from protein sequences using the pre-trained protein language model and InterPro domains with scaling gradient. Then we integrate these two embeddings with adaptive feature weights to the PPI graph and use GraphSAGE to generate the representation vector. Finally, we build the classification model to predict protein function based on the concatenated feature vector. The experimental results show that DeepFusionGO outperforms existing state-of-the-art methods, including sequence-based DeepGOPLUS, and PPI-based DeepGraphGO. DeepFusionGO also performs well in difficult protein function prediction. We demonstrate that selecting an appropriate protein features fusion method can improve the prediction performance, and using the PPI network and the protein representation vector obtained from the protein language model through the GraphSAGE algorithm is an effective way to mine potential functional clues. The source code and data sets are available at: https://github.com/Hhhzj-7/DeepFusionGO.
科研通智能强力驱动
Strongly Powered by AbleSci AI