子序列
计算机科学
人工智能
嵌入
水准点(测量)
编码
编码(内存)
任务(项目管理)
学习迁移
药物靶点
机器学习
编码器
特征学习
钥匙(锁)
模式识别(心理学)
生物
数学
基因
操作系统
大地测量学
药理学
数学分析
生物化学
计算机安全
经济
有界函数
管理
地理
作者
Yang Zhang,Juan Liu,Yalan Yan,Xiaolei Zhang,Zhimin Qiang,Xuekai Zhu,Jianfei Peng
标识
DOI:10.1016/j.compbiolchem.2024.108058
摘要
Exploring the relationship between proteins and drugs plays a significant role in discovering new synthetic drugs. The Drug-Target Interaction (DTI) prediction is a fundamental task in the relationship between proteins and drugs. Unlike encoding proteins by amino acids, we use amino acid subsequence to encode proteins, which simulates the biological process of DTI better. For this research purpose, we proposed a novel deep learning framework based on Bidirectional Encoder Representation from Transformers (BERT), which integrates high-frequency subsequence embedding and transfer learning methods to complete the DTI prediction task. As the first key module, subsequence embedding allows to explore the functional interaction units from drug and protein sequences and then contribute to finding DTI modules. As the second key module, transfer learning promotes the model learn the common DTI features from protein and drug sequences in a large dataset. Overall, the BERT-based model can learn two kinds features through the multi-head self-attention mechanism: internal features of sequence and interaction features of both proteins and drugs, respectively. Compared with other methods, BERT-based methods enable more DTI-related features to be discovered by means of attention scores which associated with tokenized protein/drug subsequences. We conducted extensive experiments for the DTI prediction task on three different benchmark datasets. The experimental results show that the model achieves an average prediction metrics higher than most baseline methods. In order to verify the importance of transfer learning, we conducted an ablation study on datasets, and the results show the superiority of transfer learning. In addition, we test the scalability of the model on the dataset in unseen drugs and proteins, and the results of the experiments show that it is acceptable in scalability.
科研通智能强力驱动
Strongly Powered by AbleSci AI