Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures

核酸 计算机科学 图形 计算生物学 人工智能 核酸结构 核糖核酸 背景(考古学) RNA结合蛋白 蛋白质功能预测 机器学习 理论计算机科学 算法 生物 生物化学 基因 蛋白质功能 古生物学
作者
Yidong Song,Qianmu Yuan,Huijun Zhao,Yuedong Yang
出处
期刊:Briefings in Bioinformatics [Oxford University Press]
卷期号:24 (6) 被引量:10
标识
DOI:10.1093/bib/bbad360
摘要

The interactions between nucleic acids and proteins are important in diverse biological processes. The high-quality prediction of nucleic-acid-binding sites continues to pose a significant challenge. Presently, the predictive efficacy of sequence-based methods is constrained by their exclusive consideration of sequence context information, whereas structure-based methods are unsuitable for proteins lacking known tertiary structures. Though protein structures predicted by AlphaFold2 could be used, the extensive computing requirement of AlphaFold2 hinders its use for genome-wide applications. Based on the recent breakthrough of ESMFold for fast prediction of protein structures, we have developed GLMSite, which accurately identifies DNA- and RNA-binding sites using geometric graph learning on ESMFold predicted structures. Here, the predicted protein structures are employed to construct protein structural graph with residues as nodes and spatially neighboring residue pairs for edges. The node representations are further enhanced through the pre-trained language model ProtTrans. The network was trained using a geometric vector perceptron, and the geometric embeddings were subsequently fed into a common network to acquire common binding characteristics. Finally, these characteristics were input into two fully connected layers to predict binding sites with DNA and RNA, respectively. Through comprehensive tests on DNA/RNA benchmark datasets, GLMSite was shown to surpass the latest sequence-based methods and be comparable with structure-based methods. Moreover, the prediction was shown useful for inferring nucleic-acid-binding proteins, demonstrating its potential for protein function discovery. The datasets, codes, and trained models are available at https://github.com/biomed-AI/nucleic-acid-binding.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Shenqm发布了新的文献求助10
刚刚
小嚣张发布了新的文献求助10
1秒前
WXY发布了新的文献求助10
2秒前
二分完成签到,获得积分20
2秒前
3秒前
jjx1005完成签到 ,获得积分0
3秒前
Hont完成签到,获得积分10
3秒前
4秒前
丘比特应助lxsh1992采纳,获得10
4秒前
CipherSage应助Tia采纳,获得10
4秒前
完美世界应助枝挽采纳,获得10
4秒前
感谢有你完成签到 ,获得积分10
4秒前
6秒前
大山竹完成签到 ,获得积分10
6秒前
LEON311完成签到,获得积分10
7秒前
可靠的南露完成签到,获得积分10
7秒前
7秒前
9秒前
鱼慕凤鸾应助科研通管家采纳,获得10
9秒前
YifanWang应助科研通管家采纳,获得10
9秒前
深情安青应助科研通管家采纳,获得10
9秒前
YifanWang应助科研通管家采纳,获得10
10秒前
10秒前
carryxu发布了新的文献求助10
10秒前
英姑应助科研通管家采纳,获得20
10秒前
10秒前
在水一方应助科研通管家采纳,获得10
10秒前
枝挽完成签到,获得积分10
10秒前
华哥应助科研通管家采纳,获得10
10秒前
赘婿应助科研通管家采纳,获得10
10秒前
YifanWang应助科研通管家采纳,获得10
10秒前
lindo完成签到 ,获得积分10
10秒前
xueyaoli发布了新的文献求助10
10秒前
NexusExplorer应助科研通管家采纳,获得10
10秒前
simple应助科研通管家采纳,获得10
10秒前
10秒前
10秒前
10秒前
simple应助科研通管家采纳,获得10
10秒前
simple应助科研通管家采纳,获得10
10秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Burger's Medicinal Chemistry, Drug Discovery and Development, Volumes 1 - 8, 8 Volume Set, 8th Edition 1800
Cronologia da história de Macau 1600
Contemporary Debates in Epistemology (3rd Edition) 1000
International Arbitration Law and Practice 1000
文献PREDICTION EQUATIONS FOR SHIPS' TURNING CIRCLES或期刊Transactions of the North East Coast Institution of Engineers and Shipbuilders第95卷 1000
BRITTLE FRACTURE IN WELDED SHIPS 1000
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 计算机科学 化学工程 生物化学 物理 复合材料 内科学 催化作用 物理化学 光电子学 细胞生物学 基因 电极 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6160455
求助须知:如何正确求助?哪些是违规求助? 7988740
关于积分的说明 16605765
捐赠科研通 5268668
什么是DOI,文献DOI怎么找? 2811172
邀请新用户注册赠送积分活动 1791287
关于科研通互助平台的介绍 1658143