作者
Arang Rhie,Sergey Nurk,Monika Čechová,Savannah J. Hoyt,Dylan J. Taylor,Nicolas Altemose,Paul W. Hook,Sergey Koren,Mikko Rautiainen,I. A. Alexandrov,Jamie Allen,Mobin Asri,Andrey V. Bzikadze,Nae-Chyun Chen,Chen-Shan Chin,Mark Diekhans,Paul Flicek,Giulio Formenti,Arkarachai Fungtammasan,Carlos García Girón,Erik Garrison,Ariel Gershman,Jennifer L. Gerton,Patrick G. S. Grady,Andrea Guarracino,Leanne Haggerty,Reza Halabian,Nancy F. Hansen,Robert S. Harris,Gabrielle A. Hartley,William T. Harvey,Marina Haukness,Jakob Heinz,Thibaut Hourlier,Robert Hubley,Sarah Hunt,Stephen Hwang,Miten Jain,Rupesh K. Kesharwani,Alexandra P. Lewis,Heng Li,Glennis A. Logsdon,Julian Lucas,Wojciech Makałowski,Christopher Markovic,Fergal J. Martin,Ann Mc Cartney,Rajiv C. McCoy,Jennifer McDaniel,Brandy M. McNulty,Paul Medvedev,Alla Mikheenko,Katherine M. Munson,Terence Murphy,Hugh E. Olsen,Nathan D. Olson,Luis F. Paulin,David Porubský,Tamara Potapova,Fedor Ryabov,Steven L. Salzberg,Michael E.G. Sauria,Fritz J. Sedlazeck,Kishwar Shafin,В. А. Шепелев,Alaina Shumate,Jessica M. Storer,Jamie Allen,Angela M. Taravella Oill,Françoise Thibaud‐Nissen,Winston Timp,Marta Tomaszkiewicz,Mitchell R. Vollger,Brian P. Walenz,Allison C. Watwood,Matthias H. Weissensteiner,Aaron M. Wenger,Melissa A. Wilson,Samantha Zarate,Yiping Zhu,Justin M. Zook,Evan E. Eichler,Rachel J. O’Neill,Michael C. Schatz,Karen H. Miga,Kateryna D. Makova,Adam M. Phillippy
摘要
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure including long palindromes, tandem repeats, and segmental duplications 1–3 . As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished 4, 5 . Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the complete ampliconic structures of TSPY , DAZ , and RBMY gene families; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the CHM13 genome 4 and mapped available population variation, clinical variants, and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.