基因组
计算生物学
完备性(序理论)
生物
连续性
眼镜蛇
病毒学
基因组
基因
遗传学
计算机科学
生态学
数学
程序设计语言
数学分析
作者
Lin-Xing Chen,Jillian F. Banfield
标识
DOI:10.1038/s41564-023-01598-2
摘要
Abstract Viruses are often studied using metagenome-assembled sequences, but genome incompleteness hampers comprehensive and accurate analyses. Contig Overlap Based Re-Assembly (COBRA) resolves assembly breakpoints based on the de Bruijn graph and joins contigs. Here we benchmarked COBRA using ocean and soil viral datasets. COBRA accurately joined the assembled sequences and achieved notably higher genome accuracy than binning tools. From 231 published freshwater metagenomes, we obtained 7,334 bacteriophage clusters, ~83% of which represent new phage species. Notably, ~70% of these were circular, compared with 34% before COBRA analyses. We expanded sampling of huge phages (≥200 kbp), the largest of which was curated to completion (717 kbp). Improved phage genomes from Rotsee Lake provided context for metatranscriptomic data and indicated the in situ activity of huge phages, whiB -encoding phages and cysC - and cysH -encoding phages. COBRA improves viral genome assembly contiguity and completeness, thus the accuracy and reliability of analyses of gene content, diversity and evolution.
科研通智能强力驱动
Strongly Powered by AbleSci AI