索引
生物
基因分型
DNA测序
外显子组测序
外显子组
基因组
1000基因组计划
计算生物学
人类基因组
结构变异
人口
基因组学
遗传学
基因型
单核苷酸多态性
突变
DNA
基因
社会学
人口学
作者
Mark A. DePristo,Eric Banks,Ryan Poplin,Kiran Garimella,Jared Maguire,Christopher Hartl,Anthony Philippakis,Guillermo del Angel,Manuel A. Rivas,Matt Hanna,Aaron McKenna,Tim Fennell,Andrew Kernytsky,Andrey Sivachenko,Kristian Cibulskis,Stacey Gabriel,David Altshuler,Mark J. Daly
出处
期刊:Nature Genetics
[Nature Portfolio]
日期:2011-04-10
卷期号:43 (5): 491-498
被引量:11083
摘要
Mark DePristo and colleagues report an analytical framework to discover and genotype variation using whole exome and genome resequencing data from next-generation sequencing technologies. They apply these methods to low-pass population sequencing data from the 1000 Genomes Project. Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI