Advancing Drug-Target interaction prediction with BERT and subsequence embedding

子序列 计算机科学 人工智能 嵌入 水准点(测量) 编码 编码(内存) 任务(项目管理) 学习迁移 药物靶点 机器学习 编码器 特征学习 钥匙(锁) 模式识别(心理学) 生物 数学 基因 操作系统 大地测量学 药理学 数学分析 生物化学 计算机安全 经济 有界函数 管理 地理
作者
Yang Zhang,Juan Liu,Yalan Yan,Xiaolei Zhang,Zhimin Qiang,Xuekai Zhu,Jianfei Peng
出处
期刊:Computational Biology and Chemistry [Elsevier]
卷期号:110: 108058-108058
标识
DOI:10.1016/j.compbiolchem.2024.108058
摘要

Exploring the relationship between proteins and drugs plays a significant role in discovering new synthetic drugs. The Drug-Target Interaction (DTI) prediction is a fundamental task in the relationship between proteins and drugs. Unlike encoding proteins by amino acids, we use amino acid subsequence to encode proteins, which simulates the biological process of DTI better. For this research purpose, we proposed a novel deep learning framework based on Bidirectional Encoder Representation from Transformers (BERT), which integrates high-frequency subsequence embedding and transfer learning methods to complete the DTI prediction task. As the first key module, subsequence embedding allows to explore the functional interaction units from drug and protein sequences and then contribute to finding DTI modules. As the second key module, transfer learning promotes the model learn the common DTI features from protein and drug sequences in a large dataset. Overall, the BERT-based model can learn two kinds features through the multi-head self-attention mechanism: internal features of sequence and interaction features of both proteins and drugs, respectively. Compared with other methods, BERT-based methods enable more DTI-related features to be discovered by means of attention scores which associated with tokenized protein/drug subsequences. We conducted extensive experiments for the DTI prediction task on three different benchmark datasets. The experimental results show that the model achieves an average prediction metrics higher than most baseline methods. In order to verify the importance of transfer learning, we conducted an ablation study on datasets, and the results show the superiority of transfer learning. In addition, we test the scalability of the model on the dataset in unseen drugs and proteins, and the results of the experiments show that it is acceptable in scalability.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
昔日发布了新的文献求助10
刚刚
Liu完成签到,获得积分10
3秒前
sci01完成签到 ,获得积分10
4秒前
5秒前
zhentg完成签到,获得积分0
5秒前
6秒前
我是老大应助gao采纳,获得10
6秒前
CPSTZR完成签到,获得积分10
8秒前
11秒前
lily发布了新的文献求助10
11秒前
AlinaG应助昔日采纳,获得10
13秒前
14秒前
15秒前
MM完成签到,获得积分10
18秒前
22秒前
23秒前
24秒前
爱吃鱼的猫完成签到,获得积分10
24秒前
27秒前
yhchow0204完成签到,获得积分10
28秒前
lzy关闭了lzy文献求助
29秒前
32秒前
D1fficulty完成签到,获得积分10
32秒前
Sosthenes完成签到,获得积分10
32秒前
gao完成签到,获得积分20
33秒前
gao发布了新的文献求助10
36秒前
AgealoLeng完成签到,获得积分10
37秒前
38秒前
李李发布了新的文献求助10
41秒前
深情安青应助huapeng采纳,获得10
42秒前
自信的无剑完成签到,获得积分10
47秒前
烨小汐关注了科研通微信公众号
49秒前
49秒前
54秒前
yayabing完成签到,获得积分10
55秒前
耍酷书雁完成签到,获得积分10
57秒前
松山少林学武功完成签到 ,获得积分10
58秒前
58秒前
sxy0604完成签到,获得积分10
58秒前
记得接电话完成签到 ,获得积分10
58秒前
高分求助中
The three stars each : the Astrolabes and related texts 1070
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Sport in der Antike 800
Aspect and Predication: The Semantics of Argument Structure 666
De arte gymnastica. The art of gymnastics 600
少脉山油柑叶的化学成分研究 530
Sport in der Antike Hardcover – March 1, 2015 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2409023
求助须知:如何正确求助?哪些是违规求助? 2104949
关于积分的说明 5315683
捐赠科研通 1832489
什么是DOI,文献DOI怎么找? 913080
版权声明 560733
科研通“疑难数据库(出版商)”最低求助积分说明 488238