Advancing Drug-Target interaction prediction with BERT and subsequence embedding

子序列计算机科学人工智能嵌入水准点（测量）编码编码（内存）任务（项目管理）学习迁移药物靶点机器学习编码器特征学习钥匙（锁）模式识别（心理学）生物数学基因数学分析药理学有界函数生物化学管理大地测量学计算机安全经济地理操作系统

作者

Zhihui Yang,Juan Liu,Feng Yang,Xiaolei Zhang,Qiang Zhang,Xuekai Zhu,Peng Jiang

出处

期刊：Computational Biology and Chemistry [Elsevier BV]
日期：2024-04-05 卷期号：110: 108058-108058 被引量：2

链接

nih.govdoi.org

标识

DOI：10.1016/j.compbiolchem.2024.108058

摘要

Exploring the relationship between proteins and drugs plays a significant role in discovering new synthetic drugs. The Drug-Target Interaction (DTI) prediction is a fundamental task in the relationship between proteins and drugs. Unlike encoding proteins by amino acids, we use amino acid subsequence to encode proteins, which simulates the biological process of DTI better. For this research purpose, we proposed a novel deep learning framework based on Bidirectional Encoder Representation from Transformers (BERT), which integrates high-frequency subsequence embedding and transfer learning methods to complete the DTI prediction task. As the first key module, subsequence embedding allows to explore the functional interaction units from drug and protein sequences and then contribute to finding DTI modules. As the second key module, transfer learning promotes the model learn the common DTI features from protein and drug sequences in a large dataset. Overall, the BERT-based model can learn two kinds features through the multi-head self-attention mechanism: internal features of sequence and interaction features of both proteins and drugs, respectively. Compared with other methods, BERT-based methods enable more DTI-related features to be discovered by means of attention scores which associated with tokenized protein/drug subsequences. We conducted extensive experiments for the DTI prediction task on three different benchmark datasets. The experimental results show that the model achieves an average prediction metrics higher than most baseline methods. In order to verify the importance of transfer learning, we conducted an ablation study on datasets, and the results show the superiority of transfer learning. In addition, we test the scalability of the model on the dataset in unseen drugs and proteins, and the results of the experiments show that it is acceptable in scalability.

求助该文献

最长约 10秒，即可获得该文献文件

Advancing Drug-Target interaction prediction with BERT and subsequence embedding

今日热心研友