Cross-Modality and Self-Supervised Protein Embedding for Compound–Protein Affinity and Contact Prediction

概化理论 模态(人机交互) 人工智能 机器学习 模式 计算机科学 嵌入 计算生物学 生物 数学 社会科学 统计 社会学
作者
Yuning You,Yang Shen
标识
DOI:10.1101/2022.07.18.500559
摘要

Abstract Motivation Computational methods for compound–protein affinity and contact (CPAC) prediction aim at facilitating rational drug discovery by simultaneous prediction of the strength and the pattern of compound–protein interactions. Although the desired outputs are highly structure-dependent, the lack of protein structures often makes structure-free methods rely on protein sequence inputs alone. The scarcity of compound–protein pairs with affinity and contact labels further limits the accuracy and the generalizability of CPAC models. Results To overcome the aforementioned challenges of structure naivety and labelled-data scarcity, we introduce cross-modality and self-supervised learning, respectively, for structure-aware and task-relevant protein embedding. Specifically, protein data are available in both modalities of 1D amino-acid sequences and predicted 2D contact maps, that are separately embedded with recurrent and graph neural networks, respectively, as well as jointly embedded with two cross-modality schemes. Furthermore, both protein modalities are pretrained under various self-supervised learning strategies, by leveraging massive amount of unlabelled protein data. Our results indicate that individual protein modalities differ in their strengths of predicting affinities or contacts. Proper cross-modality protein embedding combined with self-supervised learning improves model generalizability when predicting both affinities and contacts for unseen proteins. Availability Data and source codes are available at https://github.com/Shen-Lab/CPAC . Contact yshen@tamu.edu Supplementary information Supplementary data are included.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
Orange应助jia采纳,获得10
1秒前
2秒前
旺旺饼干发布了新的文献求助10
4秒前
4秒前
旺旺饼干发布了新的文献求助10
5秒前
翁依波发布了新的文献求助10
5秒前
5秒前
phy-cg完成签到 ,获得积分10
5秒前
5秒前
洋洋发布了新的文献求助10
6秒前
6秒前
Ethan发布了新的文献求助30
7秒前
星辰大海应助Siwen采纳,获得10
8秒前
FIN应助LZYC采纳,获得10
9秒前
惟风发布了新的文献求助10
9秒前
9秒前
巴山夜雨发布了新的文献求助10
10秒前
10秒前
11秒前
旺旺饼干发布了新的文献求助10
11秒前
11秒前
神经元完成签到,获得积分10
11秒前
旺旺饼干发布了新的文献求助10
11秒前
方半仙发布了新的文献求助10
11秒前
W哇发布了新的文献求助10
12秒前
dd发布了新的文献求助10
12秒前
超级翠应助冷酷的雁菡采纳,获得10
13秒前
旺旺饼干发布了新的文献求助10
13秒前
回头见发布了新的文献求助10
14秒前
小丫丫发布了新的文献求助10
14秒前
zewangguo发布了新的文献求助10
14秒前
15秒前
旺旺饼干发布了新的文献求助10
15秒前
某辉睡不着完成签到,获得积分20
15秒前
15秒前
15秒前
Yang发布了新的文献求助10
16秒前
Jasper应助科研通管家采纳,获得10
16秒前
ninini应助科研通管家采纳,获得10
16秒前
高分求助中
Teaching Social and Emotional Learning in Physical Education 900
Plesiosaur extinction cycles; events that mark the beginning, middle and end of the Cretaceous 500
Chinese-English Translation Lexicon Version 3.0 500
[Lambert-Eaton syndrome without calcium channel autoantibodies] 440
Two-sample Mendelian randomization analysis reveals causal relationships between blood lipids and venous thromboembolism 400
薩提亞模式團體方案對青年情侶輔導效果之研究 400
3X3 Basketball: Everything You Need to Know 310
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2386987
求助须知:如何正确求助?哪些是违规求助? 2093452
关于积分的说明 5268082
捐赠科研通 1820116
什么是DOI,文献DOI怎么找? 907987
版权声明 559236
科研通“疑难数据库(出版商)”最低求助积分说明 484991