Inferring Correlation Networks from Genomic Survey Data

虚假关系成分数据人类微生物组计划微生物群仿形（计算机编程）相关性人体微生物群鉴定（生物学）计算生物学生物计算机科学数据挖掘进化生物学生态学生物信息学机器学习数学几何学操作系统

作者

Jonathan Friedman,Eric J. Alm

出处

期刊：PLOS Computational Biology [Public Library of Science]
日期：2012-09-20 卷期号：8 (9): e1002687-e1002687 被引量：2234

链接

plos.org plos.org osti.gov doaj.org europepmc.org europepmc.org handle.net mit.edu nih.gov nih.govdoi.org

标识

DOI：10.1371/journal.pcbi.1002687

摘要

High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity.

求助该文献

最长约 10秒，即可获得该文献文件

Inferring Correlation Networks from Genomic Survey Data

今日热心研友