康蒂格
DNA测序
桑格测序
序列(生物学)
顺序装配
基因组
计算机科学
杂交基因组组装
计算生物学
过程(计算)
自由序列分析
吞吐量
参考基因组
深度测序
数据挖掘
DNA
生物
序列比对
基因组
转录组
遗传学
基因
操作系统
基因表达
肽序列
电信
无线
作者
Francis Y. L. Chin,Henry C. M. Leung,Siu‐Ming Yiu
标识
DOI:10.1007/s11427-014-4752-9
摘要
Sequence assembling is an important step for bioinformatics study. With the help of next generation sequencing (NGS) technology, high throughput DNA fragment (reads) can be randomly sampled from DNA or RNA molecular sequence. However, as the positions of reads being sampled are unknown, assembling process is required for combining overlapped reads to reconstruct the original DNA or RNA sequence. Compared with traditional Sanger sequencing methods, although the throughput of NGS reads increases, the read length is shorter and the error rate is higher. It introduces several problems in assembling. Moreover, paired-end reads instead of single-end reads can be sampled which contain more information. The existing assemblers cannot fully utilize this information and fails to assemble longer contigs. In this article, we will revisit the major problems of assembling NGS reads on genomic, transcriptomic, metagenomic and metatranscriptomic data. We will also describe our IDBA package for solving these problems. IDBA package has adopted several novel ideas in assembling, including using multiple k, local assembling and progressive depth removal. Compared with existence assemblers, IDBA has better performance on many simulated and real sequencing datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI