Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures

核酸 计算机科学 图形 计算生物学 人工智能 核酸结构 核糖核酸 背景(考古学) RNA结合蛋白 蛋白质功能预测 机器学习 理论计算机科学 算法 生物 生物化学 基因 蛋白质功能 古生物学
作者
Yidong Song,Qianmu Yuan,Huijun Zhao,Yuedong Yang
出处
期刊:Briefings in Bioinformatics [Oxford University Press]
卷期号:24 (6) 被引量:10
标识
DOI:10.1093/bib/bbad360
摘要

The interactions between nucleic acids and proteins are important in diverse biological processes. The high-quality prediction of nucleic-acid-binding sites continues to pose a significant challenge. Presently, the predictive efficacy of sequence-based methods is constrained by their exclusive consideration of sequence context information, whereas structure-based methods are unsuitable for proteins lacking known tertiary structures. Though protein structures predicted by AlphaFold2 could be used, the extensive computing requirement of AlphaFold2 hinders its use for genome-wide applications. Based on the recent breakthrough of ESMFold for fast prediction of protein structures, we have developed GLMSite, which accurately identifies DNA- and RNA-binding sites using geometric graph learning on ESMFold predicted structures. Here, the predicted protein structures are employed to construct protein structural graph with residues as nodes and spatially neighboring residue pairs for edges. The node representations are further enhanced through the pre-trained language model ProtTrans. The network was trained using a geometric vector perceptron, and the geometric embeddings were subsequently fed into a common network to acquire common binding characteristics. Finally, these characteristics were input into two fully connected layers to predict binding sites with DNA and RNA, respectively. Through comprehensive tests on DNA/RNA benchmark datasets, GLMSite was shown to surpass the latest sequence-based methods and be comparable with structure-based methods. Moreover, the prediction was shown useful for inferring nucleic-acid-binding proteins, demonstrating its potential for protein function discovery. The datasets, codes, and trained models are available at https://github.com/biomed-AI/nucleic-acid-binding.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
667788发布了新的文献求助10
1秒前
ophcyl完成签到,获得积分10
3秒前
苑阿宇完成签到 ,获得积分10
4秒前
Owen应助专注俊驰采纳,获得10
8秒前
10秒前
追寻语雪完成签到,获得积分10
11秒前
11秒前
典雅的访风完成签到,获得积分10
13秒前
13秒前
wang5945发布了新的文献求助10
14秒前
15秒前
会化蝶发布了新的文献求助10
17秒前
17秒前
怡然谷雪完成签到,获得积分20
18秒前
唐凡发布了新的文献求助10
20秒前
667788完成签到,获得积分10
20秒前
专注俊驰发布了新的文献求助10
20秒前
xxegt完成签到 ,获得积分10
22秒前
JamesPei应助清秀雪瑶采纳,获得10
26秒前
莹崽无敌完成签到 ,获得积分10
27秒前
xcc完成签到 ,获得积分10
27秒前
27秒前
傻自强呀发布了新的文献求助10
30秒前
唐凡完成签到,获得积分10
30秒前
31秒前
geyahe发布了新的文献求助10
31秒前
灯与鬼应助科研通管家采纳,获得10
32秒前
婷婷小笑应助科研通管家采纳,获得10
32秒前
香蕉觅云应助科研通管家采纳,获得10
32秒前
上官若男应助科研通管家采纳,获得10
32秒前
ZhouYW应助科研通管家采纳,获得10
32秒前
CipherSage应助科研通管家采纳,获得10
32秒前
丘比特应助科研通管家采纳,获得10
32秒前
ZhouYW应助科研通管家采纳,获得10
32秒前
SciGPT应助科研通管家采纳,获得10
32秒前
zho应助科研通管家采纳,获得10
32秒前
所所应助科研通管家采纳,获得40
32秒前
科研通AI5应助科研通管家采纳,获得10
32秒前
共享精神应助科研通管家采纳,获得10
33秒前
33秒前
高分求助中
Technologies supporting mass customization of apparel: A pilot project 600
Introduction to Strong Mixing Conditions Volumes 1-3 500
Tip60 complex regulates eggshell formation and oviposition in the white-backed planthopper, providing effective targets for pest control 400
A Field Guide to the Amphibians and Reptiles of Madagascar - Frank Glaw and Miguel Vences - 3rd Edition 400
China Gadabouts: New Frontiers of Humanitarian Nursing, 1941–51 400
The Healthy Socialist Life in Maoist China, 1949–1980 400
Walking a Tightrope: Memories of Wu Jieping, Personal Physician to China's Leaders 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3798124
求助须知:如何正确求助?哪些是违规求助? 3343561
关于积分的说明 10316676
捐赠科研通 3060263
什么是DOI,文献DOI怎么找? 1679457
邀请新用户注册赠送积分活动 806563
科研通“疑难数据库(出版商)”最低求助积分说明 763264