ProtGO: A Transformer based Fusion Model for accurately predicting Gene Ontology (GO) Terms from full scale Protein Sequences

变压器 基因本体论 计算机科学 计算生物学 融合 本体论 基因 人工智能 遗传学 生物 工程类 基因表达 电气工程 哲学 语言学 认识论 电压
作者
Azwad Tamir,J.S. Yuan
出处
期刊:Cornell University - arXiv
标识
DOI:10.48550/arxiv.2412.05776
摘要

Recent developments in next generation sequencing technology have led to the creation of extensive, open-source protein databases consisting of hundreds of millions of sequences. To render these sequences applicable in biomedical applications, they must be meticulously annotated by wet lab testing or extracting them from existing literature. Over the last few years, researchers have developed numerous automatic annotation systems, particularly deep learning models based on machine learning and artificial intelligence, to address this issue. In this work, we propose a transformer-based fusion model capable of predicting Gene Ontology (GO) terms from full-scale protein sequences, achieving state-of-the-art accuracy compared to other contemporary machine learning annotation systems. The approach performs particularly well on clustered split datasets, which comprise training and testing samples originating from distinct distributions that are structurally diverse. This demonstrates that the model is able to understand both short and long term dependencies within the enzyme's structure and can precisely identify the motifs associated with the various GO terms. Furthermore, the technique is lightweight and less computationally expensive compared to the benchmark methods, while at the same time not unaffected by sequence length, rendering it appropriate for diverse applications with varying sequence lengths.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
阿阮完成签到,获得积分10
1秒前
1秒前
Danish完成签到,获得积分10
1秒前
小蘑菇应助激战小唐僧采纳,获得10
2秒前
killingpaper关注了科研通微信公众号
3秒前
3秒前
4秒前
你走以后完成签到,获得积分10
4秒前
5秒前
正直美女完成签到,获得积分10
5秒前
Owen应助李甄好采纳,获得10
6秒前
尔东完成签到,获得积分10
6秒前
发dasd发布了新的文献求助10
7秒前
lucy发布了新的文献求助30
9秒前
9秒前
阿阮发布了新的文献求助10
9秒前
Nalisher发布了新的文献求助10
10秒前
科研小白书hz完成签到 ,获得积分10
10秒前
在水一方应助Hohai采纳,获得10
10秒前
旷野发布了新的文献求助10
13秒前
最后的炫神完成签到,获得积分10
14秒前
14秒前
14秒前
丘比特应助monster采纳,获得30
15秒前
溜溜梅发布了新的文献求助10
15秒前
xf完成签到,获得积分10
17秒前
落后的楼房完成签到 ,获得积分10
17秒前
18秒前
18秒前
陈陈陈陈完成签到 ,获得积分10
19秒前
HYDROGEL发布了新的文献求助10
19秒前
19秒前
Nalisher完成签到,获得积分10
20秒前
洛尘发布了新的文献求助20
20秒前
21秒前
maasai发布了新的文献求助10
21秒前
21秒前
英俊之桃完成签到 ,获得积分10
22秒前
22秒前
23秒前
高分求助中
Les Mantodea de Guyane Insecta, Polyneoptera 2500
Mobilization, center-periphery structures and nation-building 600
Technologies supporting mass customization of apparel: A pilot project 450
China—Art—Modernity: A Critical Introduction to Chinese Visual Expression from the Beginning of the Twentieth Century to the Present Day 430
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
China Gadabouts: New Frontiers of Humanitarian Nursing, 1941–51 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3791657
求助须知:如何正确求助?哪些是违规求助? 3336027
关于积分的说明 10278555
捐赠科研通 3052666
什么是DOI,文献DOI怎么找? 1675260
邀请新用户注册赠送积分活动 803270
科研通“疑难数据库(出版商)”最低求助积分说明 761165