子串
串联重复
基因组
遗传学
计算生物学
生物
结构变异
微卫星
等位基因
人类基因组
计算机科学
重复序列
基因
集合(抽象数据类型)
程序设计语言
作者
Chirag Jain,Arang Rhie,Nancy F. Hansen,Sergey Koren,Adam M. Phillippy
出处
期刊:Nature Methods
[Springer Nature]
日期:2022-04-01
卷期号:19 (6): 705-710
被引量:172
标识
DOI:10.1038/s41592-022-01457-8
摘要
Approximately 5-10% of the human genome remains inaccessible due to the presence of repetitive sequences such as segmental duplications and tandem repeat arrays. We show that existing long-read mappers often yield incorrect alignments and variant calls within long, near-identical repeats, as they remain vulnerable to allelic bias. In the presence of a nonreference allele within a repeat, a read sampled from that region could be mapped to an incorrect repeat copy. To address this limitation, we developed a new long-read mapping method, Winnowmap2, by using minimal confidently alignable substrings. Winnowmap2 computes each read mapping through a collection of confident subalignments. This approach is more tolerant of structural variation and more sensitive to paralog-specific variants within repeats. Our experiments highlight that Winnowmap2 successfully addresses the issue of allelic bias, enabling more accurate downstream variant calls in repetitive sequences.
科研通智能强力驱动
Strongly Powered by AbleSci AI