计算生物学
人工神经网络
序列(生物学)
计算机科学
人工智能
小干扰RNA
图形
深层神经网络
核糖核酸
生物
遗传学
理论计算机科学
基因
作者
Robin Eamonn Long,Ziyu Guo,Da Han,Xudong Yang,Guangyong Chen,Pheng‐Ann Heng,Liang Zhang
标识
DOI:10.1101/2024.04.28.591509
摘要
With the growing attention on siRNA silencing efficacy prediction, many methods have been proposed recently ranging from traditional data analysis methods to advanced machine learning models. However, previous works fail to explore complex but vital information, e.g., the RNA sequence interactions and related proteins. To alleviate this issue, we propose siRNADesign, a GNN model that innovatively analyzes both non-empirical and empirical-rules-based features of siRNA and mRNA sequences. This comprehensive approach allows siRNADesign to capture the nuanced dynamics of gene silencing effectively, achieving unprecedented state-of-the-art results across various datasets. Furthermore, we introduce a novel dataset-splitting methodology to mitigate the issues of data leakage and the shortcomings of traditional validation techniques in prior works. By considering siRNA and mRNA sequences as separate entities for dataset segmentation, this method guarantees a more accurate and unbiased evaluation of the model's performance. Through extensive evaluation on both widely used and external datasets, siRNADesign has demonstrated exceptional predictive accuracy and robustness under diverse experimental conditions. This work not only provides a robust foundation for the advancement of predictive models in gene silencing but also proposes a new dataset-splitting approach that aims to redefine the standards for future research, promoting more thorough and realistic assessment methodologies.
科研通智能强力驱动
Strongly Powered by AbleSci AI