人类基因组
基因组
计算生物学
顺序装配
DNA测序
生物
遗传学
基因
转录组
基因表达
作者
Aaron M. Wenger,Paul Peluso,William J. Rowell,Pi-Chuan Chang,Richard Hall,Gregory T. Concepcion,Jana Ebler,Arkarachai Fungtammasan,Alexey Kolesnikov,Nathan D. Olson,Armin Töpfer,Michael Alonge,Medhat Mahmoud,Yufeng Qian,Chen-Shan Chin,Adam M. Phillippy,Michael C. Schatz,Gene Myers,Mark A. DePristo,Jue Ruan
标识
DOI:10.1038/s41587-019-0217-9
摘要
The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the ‘genome in a bottle’ (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads. High-fidelity reads improve variant detection and genome assembly on the PacBio platform.
科研通智能强力驱动
Strongly Powered by AbleSci AI