作者
Duy Tin Truong,Eric A. Franzosa,Timothy L. Tickle,Matthias Scholz,George Weingart,Edoardo Pasolli,Adrian Tett,Curtis Huttenhower,Nicola Segata
摘要
MetaPhlAn 1 (Metagenomic Phylogenetic Analysis) is an efficient method to taxonomically characterize whole-metagenome shotgun (WMS) samples that has been successfully employed in large-scale microbial community studies 2,3 .This work complements the original species-level profiling method with a system for eukaryotic and viral quantitation, strain-level identification, and strain tracking.These and other extensions make MetaPhlAn2 (http://segatalab.cibio.unitn.it/tools/metaphlan2/)an efficient method for mining WMS samples.The method employs clade-specific markers to unequivocally assign reads to taxonomic clades, estimate these markers' coverages, and profile their presence and abundance 1 .With a ~10× increase in sequenced genomes in the last two years, MetaPhlAn2 now includes ~1M markers (average 184 ± 45 for bacterial species) from >7,500 species (Supplementary Tables 123).Sub-species markers enable strainlevel analyses, and quasi-markers improve accuracy and allow the detection of viruses and eukaryotic microbes (full list of additions in Supplementary Notes 1-3).We validated MetaPhlAn2 using synthetic metagenomes (SMs, Supplementary Note 4).On the 24 SMs comprising 656 millions of reads and 1,295 species, MetaPhlAn2 proved more accurate than mOTU 4 and Kraken 5 (average correlation 0.95 ± 0.05 against 0.80 ± 0.21 and 0.75 ± 0.22, Fig. 1a and Supplementary Figs.12345678) with fewer false positives-negatives (average 10-12 against 22-27 and 23-27), even when including genomes not in the reference database (see Supplementary Note 4).With the adoption of fast mappers (BowTie2) and support for parallelism, MetaPhlAn2 is >10× faster than MetaPhlAn1, and in line with the speed of other tested approaches (Supplementary Fig. 9).We applied MetaPhlAn2 on four elbow skin samples we sequenced from three subjects (Fig. 1b, Supplementary Note 5).Propionibacterium acnes and Staphylococcus epidermidis dominate these sites, in agreement with expected genus-level results 6 while providing species-level resolution.Together with these core species, we found Malassezia globosa in 93.65% of samples and confirmed it by coverage analysis (Supplementary Fig. 10).Although a known colonizer of the skin, its metagenomic characterization highlights the ability of MetaPhlAn2 in identifying non prokaryotic species.Phages (e.g., for Propionibacterium) as well as double-stranded DNA viruses of the Polyomavirus genus are also consistently detected.We then expanded the profiling to the whole set of 982 samples from other body sites from the HMP, including newly sequenced time points, as discussed in Supplementary Note 6.Tracking microbes across samples has been performed extensively with culture-dependent approaches, and MetaPhlAn2 now offers this possibility in a culture-independent setting by fingerprinting the microbiome at the strain level.This is illustrated on the multiple time point (n = 3) HMP/HMP1-II dataset, in which species-specific strain fingerprints are subject-specific and conserved longitudinally (Supplementary Note 7).