德布鲁因图                        
                
                                
                        
                            德布鲁恩序列                        
                
                                
                        
                            k-mer公司                        
                
                                
                        
                            基因组                        
                
                                
                        
                            计算生物学                        
                
                                
                        
                            生物                        
                
                                
                        
                            顺序装配                        
                
                                
                        
                            计算机科学                        
                
                                
                        
                            算法                        
                
                                
                        
                            遗传学                        
                
                                
                        
                            组合数学                        
                
                                
                        
                            数学                        
                
                                
                        
                            基因                        
                
                                
                        
                            基因表达                        
                
                                
                        
                            转录组                        
                
                        
                    
            作者
            
                Anton Bankevich,Andrey V. Bzikadze,Mikhail Kolmogorov,Dmitry Antipov,Pavel A. Pevzner            
         
                    
        
    
            
            标识
            
                                    DOI:10.1038/s41587-022-01220-6
                                    
                                
                                 
         
        
                
            摘要
            
            Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large k-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large k-mer sizes and transforms it into a multiplex de Bruijn graph with varying k-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes.
         
            
 
                 
                
                    
                    科研通智能强力驱动
Strongly Powered by AbleSci AI