基因组
生物
参考基因组
索引
计算生物学
基因组学
遗传学
德布鲁因图
图形
1000基因组计划
计算机科学
单核苷酸多态性
基因
理论计算机科学
基因型
作者
Goran Rakočević,Vladimir Semenyuk,Wan‐Ping Lee,James Spencer,John Browning,Ivan J. Johnson,V. Arsenijevic,Jelena Nadj,Kaushik Ghose,Maria Suciu,Sun‐Gou Ji,Gülfem Demir,Lizao Li,Berke Ç. Toptaş,Alexey Dolgoborodov,Björn Pollex,Iosif Spulber,Irina Glotova,Péter Kómár,A. L. Stachyra
出处
期刊:Nature Genetics
[Nature Portfolio]
日期:2019-01-09
卷期号:51 (2): 354-362
被引量:218
标识
DOI:10.1038/s41588-018-0316-4
摘要
The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5 h using a system with 36 CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses. Graph Genome Pipeline is a read-alignment and variant-calling pipeline based on graph genomes that offers improved read-mapping and variant-calling accuracy while achieving speed comparable to those of linear reference genome pipelines.
科研通智能强力驱动
Strongly Powered by AbleSci AI