生物
基因组
顺序装配
基因组
标杆管理
计算生物学
DNA测序
遗传学
字错误率
基因
计算机科学
人工智能
业务
基因表达
转录组
营销
作者
Wenjuan Yu,Haohui Luo,Jinbao Yang,Shengchen Zhang,Heling Jiang,Xianjia Zhao,Xingqi Hui,Dan Sun,Liang Li,Xiuqing Wei,Stefano Lonardi,Weihua Pan
标识
DOI:10.1101/gr.278232.123
摘要
Pacific Biosciences (PacBio) HiFi sequencing technology generates long reads (>10 kbp) with very high accuracy (<0.01% sequencing error). Although several de novo assembly tools are available for HiFi reads, there are no comprehensive studies on the evaluation of these assemblers. We evaluated the performance of 11 de novo HiFi assemblers on (1) real data for three eukaryotic genomes; (2) 34 synthetic data sets with different ploidy, sequencing coverage levels, heterozygosity rates, and sequencing error rates; (3) one real metagenomic data set; and (4) five synthetic metagenomic data sets with different composition abundance and heterozygosity rates. The 11 assemblers were evaluated using quality assessment tool (QUAST) and benchmarking universal single-copy ortholog (BUSCO). We also used several additional criteria, namely, completion rate, single-copy completion rate, duplicated completion rate, average proportion of largest category, average distance difference, quality value, run-time, and memory utilization. Results show that hifiasm and hifiasm-meta should be the first choice for assembling eukaryotic genomes and metagenomes with HiFi data. We performed a comprehensive benchmarking study of commonly used assemblers on complex eukaryotic genomes and metagenomes. Our study will help the research community to choose the most appropriate assembler for their data and identify possible improvements in assembly algorithms.
科研通智能强力驱动
Strongly Powered by AbleSci AI