作者
Wen-Wei Liao,Mobin Asri,Jana Ebler,Daniel Doerr,Marina Haukness,Glenn Hickey,Shuangjia Lu,Julian Lucas,Jean Monlong,Haley J. Abel,Silvia Buonaiuto,Xian Chang,Haoyu Cheng,Justin Jang Hann Chu,Vincenza Colonna,Jordan M. Eizenga,Xiaowen Feng,Christian S. Fischer,Robert S. Fulton,Shilpa Garg,Cristian Groza,Andrea Guarracino,William S. Harvey,Simon Heumos,Kerstin Howe,Miten Jain,Tsung-Yu Lu,Charles Markello,Fergal J. Martin,Matthew G. E. Mitchell,Katherine M. Munson,Moses Njagi Mwaniki,Adam M. Novak,Hugh E. Olsen,Trevor Pesout,David Porubsky,Mark G. Sterken,Jonas Andreas Sibbesen,Chad Tomlinson,Flavia Villani,Mitchell R. Vollger,Guillaume Bourque,Mark Chaisson,Paul Flicek,Adam M. Phillippy,Justin M. Zook,Evan E. Eichler,David Haussler,Erich D. Jarvis,Karen H. Miga,Ting Wang,Erik Garrison,Tobias Marschall,Ira M. Hall,Heng Li,Benedict Paten
摘要
Abstract The Human Pangenome Reference Consortium (HPRC) presents a first draft human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence and are more than 99% accurate at the structural and base-pair levels. Based on alignments of the assemblies, we generated a draft pangenome that captures known variants and haplotypes, reveals novel alleles at structurally complex loci, and adds 119 million base pairs of euchromatic polymorphic sequence and 1,529 gene duplications relative to the existing reference, GRCh38. Roughly 90 million of the additional base pairs derive from structural variation. Using our draft pangenome to analyze short-read data reduces errors when discovering small variants by 34% and boosts the detected structural variants per haplotype by 104% compared to GRCh38-based workflows, and by 34% compared to using previous diversity sets of genome assemblies.