适体
变压器
计算机科学
计算生物学
人工智能
分子生物学
生物
工程类
电气工程
电压
作者
Buyong Ma,Zhichao Yan,Yue Kang
标识
DOI:10.1093/clinchem/hvaf086.545
摘要
Abstract Background Aptamers has drawn significant attention in light of the emerging prominence of nucleic acid-based therapeutics and diagnosis. Aptamers are single-stranded oligonucleotides or short peptides characterized by a distinctive three-dimensional architecture comprising of 20 to 100 nucleotides (nt). They exhibit high affinity and specificity towards target molecules. They have great potential in the detection and medical fields. The SELEX technique is an empirical experimental method. Aptamers obtained by this method are often time-consuming to produce and may have low affinity. With the development of computational technology, artificial intelligence algorithms have demonstrated excellent performance in the field of nucleic acids. Several machine learning approaches have published to predict protein-aptamer interaction. Methods Here, we present SelfTrans-Ensemble, a deep learning model that integrates sequence information models and structural information models to extract multi-scale features for predicting aptamer-protein interactions (APIs). The model employs two pre-trained models, ProtBert and RNA-FM, to encode protein and aptamer sequences, along with features generated from primary sequence and secondary structural information. To address the data imbalance in the aptamer dataset imbalance, we incorporated short RNA-protein interaction data in the training set. Results We have compiled a dataset consists of 1422 aptamer/RNA sequences and 848 protein sequences, for a total of 1934 aptamer/RNA-protein interaction entries. Our model resulted in a training accuracy of 98.9% and a test accuracy of 88.0%, demonstrating the model*s effectiveness in accurately predicting APIs. We investigated the attention learned for aptamer and protein sequences to explore the enabling residue/nucleotides for APIs, and evaluate if the applied transformer-based network is capable to capture the short-range and long-range dependencies efficiently for aptamer and protein sequences. For a DNA aptamer binding to Von Willebrand Factor (VWF, PDB 3HXO), we found that the attention layer strongly associate with binding correlation, which is consistent with previous structural analyses. Additionally, analysis using molecular simulation indicated that SelfTrans-Ensemble is sensitive to aptamer sequence mutations. Conclusion SelfTrans-Ensemble exhibits an F1 score of 0.896 and an AUC of 0.9232, indicating that the model is capable of effectively predicting APIs. We further explored the sensitivity of the model by assessing its response to double mutations in RNA sequences and found that the transformer-based model is capable of capturing small mutations in sequences,providing insights of the model*s applicability to facilitate RNA design approach aimed at targeting specific proteins. Our approach holds potential to serve as a rapid and reliable screening approach for binding aptamer sequences towards target proteins, improving the cost-effectiveness and efficiency of SELEX in aptamer screening.
科研通智能强力驱动
Strongly Powered by AbleSci AI