德布鲁恩序列
德布鲁因图
k-mer公司
计算机科学
基因组
端粒
顺序装配
图形
计算生物学
理论计算机科学
算法
生物
组合数学
数学
遗传学
DNA
基因
基因表达
转录组
作者
Anton Bankevich,Andrey V. Bzikadze,Mikhail Kolmogorov,Dmitry Antipov,Pavel A. Pevzner
标识
DOI:10.1101/2020.12.10.420448
摘要
Abstract Although most existing genome assemblers are based on the de Bruijn graphs, it remains unclear how to construct these graphs for large genomes and large k -mer sizes. This algorithmic challenge has become particularly important with the emergence of long high-fidelity (HiFi) reads that were recently utilized to generate a semi-manual telomere-to-telomere assembly of the human genome and to get a glimpse into biomedically important regions that evaded all previous attempts to sequence them. To enable automated assemblies of long and accurate reads, we developed a fast LJA algorithm that reduces the error rate in these reads by three orders of magnitude (making them nearly error-free) and constructs the de Bruijn graph for large genomes and large k -mer sizes. Since the de Bruijn graph constructed for a fixed k -mer size is typically either too tangled or too fragmented, LJA uses a new concept of a multiplex de Bruijn graph with varying k -mer sizes. We demonstrate that LJA improves on the state-of-the-art assemblers with respect to both accuracy and contiguity and enables automated telomere-to-telomere assemblies of entire human chromosomes.
科研通智能强力驱动
Strongly Powered by AbleSci AI