深度学习
计算机科学
人工智能
深度测序
DNA测序
过程(计算)
嵌入
转录组
计算生物学
机器学习
模式识别(心理学)
数据挖掘
生物
遗传学
基因
基因组
操作系统
基因表达
作者
Weiguo Li,Junchi Ma,Cuiyuan Li,Ting Yu,Xuefeng Cui
标识
DOI:10.1109/bibm58861.2023.10385824
摘要
The emergence of Third-Generation Sequencing (TGS) has revolutionized transcriptome sequencing, allowing the production of long reads that span multiple kilobases. This breakthrough has enabled the sequencing of entire transcript sequences. However, a major challenge is posed by the high error rates associated with TGS, making it difficult to accurately classify transcript sequences against reference sequences using traditional algorithms. Fortunately, deep learning-based embedding methods can be trained to overcome these errors. In this pioneering study, we introduce trxCNN, a deep learning model that exhibits remarkable accuracy in classifying erroneous transcript sequences compared to reference sequences. Specifically, evaluations of simulated data have revealed that trxCNN has an impressive classification accuracy of 87.1%. This accuracy exceeds that of the Minimap2 and magicBLAST aligners, both designed for TGS data, by 10.7% and 9.0%, respectively. Furthermore, we provide evidence that trxCNN is capable of accurately estimating the abundance of transcripts. These findings strongly suggest that deep learning methods have great potential to effectively process errors-affected sequencing data.
科研通智能强力驱动
Strongly Powered by AbleSci AI