移相器
插补(统计学)
单倍型
计算机科学
参考基因组
1000基因组计划
桑格测序
人口
数据挖掘
计算生物学
生物
遗传学
算法
单核苷酸多态性
基因组
DNA测序
缺少数据
等位基因
医学
物理
基因型
机器学习
光学
基因
环境卫生
DNA
作者
Po−Ru Loh,Petr Danecek,Pier Francesco Palamara,Christian Fuchsberger,Yakir Reshef,Hilary Finucane,Sebastian Schoenherr,Lukas Forer,Shane McCarthy,Gonçalo R. Abecasis,Richard Durbin,Alkes L. Price
摘要
Haplotype phasing is a fundamental problem in medical and population genetics. Phasing is generally performed via statistical phasing within a genotyped cohort, an approach that can attain high accuracy in very large cohorts but attains lower accuracy in smaller cohorts. Here, we instead explore the paradigm of reference-based phasing. We introduce a new phasing algorithm, Eagle2, that attains high accuracy across a broad range of cohort sizes by efficiently leveraging information from large external reference panels (such as the Haplotype Reference Consortium, HRC) using a new data structure based on the positional BurrowsWheeler transform. We demonstrate that Eagle2 attains a ≈20x speedup and ≈10% increase in accuracy compared to reference-based phasing using SHAPEIT2. On European-ancestry samples, Eagle2 with the HRC panel achieves >2x the accuracy of 1000 Genomes-based phasing. Eagle2 is open source and freely available for HRC-based phasing via the Sanger Imputation Service and the Michigan Imputation Server.
科研通智能强力驱动
Strongly Powered by AbleSci AI