基因组
推论
仿形(计算机编程)
计算生物学
生物
计算机科学
进化生物学
遗传学
基因
人工智能
操作系统
作者
Yucheng Wang,Thorfinn Sand Korneliussen,Luke E. Holman,Andrea Manica,Mikkel Winther Pedersen
标识
DOI:10.1111/2041-210x.14006
摘要
Abstract Metagenomic data generated from environmental samples is increasingly common in the analysis of modern and ancient biological communities. To obtain taxonomic profiles from this type of data, DNA sequences are aligned against large genomic reference databases and the lowest common ancestor (LCA) needs to be inferred for each sequence with multiple alignments. To date, efforts have mainly focused on improving the speed, sensitivity and specificity of alignment tools, and little effort has been applied to the LCA algorithm that generates the taxonomic profiles from alignments. We present ngs LCA, a command‐line toolkit with two separate modules: the main program (in C/C++) performing LCA inference, and an R package for generating tables and visualisations of the taxonomic profiles. ngs LCA processed large datasets in BAM/SAM alignment format 4–11 times faster and used less memory compared to other available programs. It is compatible with the NCBI taxonomy and has flexible parameter settings. Furthermore, the toolkit offers functions for filtering, contamination removal, taxonomic clustering, and multiple ways of visualising the generated taxonomic profiles. ngs LCA bridges a gap in current metagenomic analyses by supplying a computationally light, easy‐to‐use, accurate, fast and flexible LCA algorithm with R functions for processing and illustrating the taxonomic profiles
科研通智能强力驱动
Strongly Powered by AbleSci AI